[build-path-prefix-map-spec] 34/50: Require "total and injective" instead of "decode+encode and encode+decode is the identity"
Ximin Luo
infinity0 at debian.org
Fri Mar 10 15:17:21 UTC 2017
This is an automated email from the git hooks/post-receive script.
infinity0 pushed a commit to branch master
in repository build-path-prefix-map-spec.
commit b462ea6fb8de2aba6f4168c5d13ba31cda79e420
Author: Ximin Luo <infinity0 at debian.org>
Date: Thu Feb 23 19:35:23 2017 +0100
Require "total and injective" instead of "decode+encode and encode+decode is the identity"
We don't expect many people to implement both decode and encode in the same
program and it's also a bit confusing to talk about the composition of both the
data structure encoding and character encoding.
Also move some notes downwards for visual clarity.
---
spec-draft.rst | 39 ++++++++++++++++++++++++---------------
1 file changed, 24 insertions(+), 15 deletions(-)
diff --git a/spec-draft.rst b/spec-draft.rst
index 9fa383d..73e5e2c 100644
--- a/spec-draft.rst
+++ b/spec-draft.rst
@@ -33,13 +33,12 @@ platforms, these string types are the types of both filesystem paths and
environment variables on that platform.
When implementing this data structure encoding, either (a) you MUST directly
-operate on the string types described above *without* also decoding or encoding
-them using a character encoding (e.g. UTF-8 or UTF-16); or (b) if you must use
-a character encoding e.g. because your language's standard libraries force you
-to, then you MUST ensure that the overall encode+decode and decode+encode
-operations always exactly preserves the original structure or value, even if it
-contains data that was invalid for the character encoding that was used. See
-[TODO link] for further details and guidance on how to do this.
+operate on the system string types described above *without* also decoding or
+encoding them using a character encoding such as UTF-8 or UTF-16; or (b) if you
+must use a character encoding e.g. because your language's standard libraries
+force you to, then either it is total and injective over the system string type
+[0]_, or you MUST raise a parse error for inputs where it is undefined or not
+injective. See [TODO link] for further details and guidance on how to do this.
The encoding is as follows:
@@ -48,10 +47,6 @@ The encoding is as follows:
Empty subsequences between ``:`` characters, or between a ``:`` character and
either the left or right end of the envvar, are valid and are ignored. [1]_
- .. [1] This is to make it easier for producers to append values, e.g. as in
- ``envvar += ":" + encoded_pair`` which would be valid even if envvar
- is originally empty.
-
- Each encoded list item contains exactly one ``=`` character, that separates
encoded pair elements.
@@ -63,10 +58,6 @@ The encoding is as follows:
The encoded pair elements may be empty; this does not need special-casing if
the rest of the document is implemented correctly.
- .. [2] This is to "fail early" in the cases that a naive producer does not
- encode characters like ``=`` but the build path or target path does
- actually contain them.
-
- Each encoded pair element is encoded with the following mapping:
1. ``%`` → ``%p``
@@ -175,6 +166,24 @@ Example source code is available on the above page, as well as in runnable form
on `<https://github.com/infinity0/rb-prefix-map>`_. FIXME use alioth link
+Notes
+=====
+
+.. [0] In practice, this means any two byte sequences that are invalid UTF-8,
+ or ``wchar_t`` sequences that are invalid UTF-16, are decoded into distinct
+ application-level character string values. This is not satisfied by most
+ standard Unicode decoding strategies, which is to replace invalid input
+ sequences with ``U+FFFD REPLACEMENT CHARACTER``.
+
+.. [1] This is to make it easier for producers to append values, e.g. as in
+ ``envvar += ":" + encoded_pair`` which would be valid even if envvar is
+ originally empty.
+
+.. [2] This is to "fail early" in the cases that a naive producer does not
+ encode characters like ``=`` but the build path or target path does
+ actually contain them.
+
+
References
==========
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/build-path-prefix-map-spec.git
More information about the Reproducible-commits
mailing list