[build-path-prefix-map-spec] 34/50: Require "total and injective" instead of "decode+encode and encode+decode is the identity"

Ximin Luo infinity0 at debian.org
Fri Mar 10 15:17:21 UTC 2017


This is an automated email from the git hooks/post-receive script.

infinity0 pushed a commit to branch master
in repository build-path-prefix-map-spec.

commit b462ea6fb8de2aba6f4168c5d13ba31cda79e420
Author: Ximin Luo <infinity0 at debian.org>
Date:   Thu Feb 23 19:35:23 2017 +0100

    Require "total and injective" instead of "decode+encode and encode+decode is the identity"
    
    We don't expect many people to implement both decode and encode in the same
    program and it's also a bit confusing to talk about the composition of both the
    data structure encoding and character encoding.
    
    Also move some notes downwards for visual clarity.
---
 spec-draft.rst | 39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/spec-draft.rst b/spec-draft.rst
index 9fa383d..73e5e2c 100644
--- a/spec-draft.rst
+++ b/spec-draft.rst
@@ -33,13 +33,12 @@ platforms, these string types are the types of both filesystem paths and
 environment variables on that platform.
 
 When implementing this data structure encoding, either (a) you MUST directly
-operate on the string types described above *without* also decoding or encoding
-them using a character encoding (e.g. UTF-8 or UTF-16); or (b) if you must use
-a character encoding e.g. because your language's standard libraries force you
-to, then you MUST ensure that the overall encode+decode and decode+encode
-operations always exactly preserves the original structure or value, even if it
-contains data that was invalid for the character encoding that was used. See
-[TODO link] for further details and guidance on how to do this.
+operate on the system string types described above *without* also decoding or
+encoding them using a character encoding such as UTF-8 or UTF-16; or (b) if you
+must use a character encoding e.g. because your language's standard libraries
+force you to, then either it is total and injective over the system string type
+[0]_, or you MUST raise a parse error for inputs where it is undefined or not
+injective. See [TODO link] for further details and guidance on how to do this.
 
 The encoding is as follows:
 
@@ -48,10 +47,6 @@ The encoding is as follows:
   Empty subsequences between ``:`` characters, or between a ``:`` character and
   either the left or right end of the envvar, are valid and are ignored. [1]_
 
-  .. [1] This is to make it easier for producers to append values, e.g. as in
-         ``envvar += ":" + encoded_pair`` which would be valid even if envvar
-         is originally empty.
-
 - Each encoded list item contains exactly one ``=`` character, that separates
   encoded pair elements.
 
@@ -63,10 +58,6 @@ The encoding is as follows:
   The encoded pair elements may be empty; this does not need special-casing if
   the rest of the document is implemented correctly.
 
-  .. [2] This is to "fail early" in the cases that a naive producer does not
-         encode characters like ``=`` but the build path or target path does
-         actually contain them.
-
 - Each encoded pair element is encoded with the following mapping:
 
   1. ``%`` → ``%p``
@@ -175,6 +166,24 @@ Example source code is available on the above page, as well as in runnable form
 on `<https://github.com/infinity0/rb-prefix-map>`_. FIXME use alioth link
 
 
+Notes
+=====
+
+.. [0] In practice, this means any two byte sequences that are invalid UTF-8,
+    or ``wchar_t`` sequences that are invalid UTF-16, are decoded into distinct
+    application-level character string values. This is not satisfied by most
+    standard Unicode decoding strategies, which is to replace invalid input
+    sequences with ``U+FFFD REPLACEMENT CHARACTER``.
+
+.. [1] This is to make it easier for producers to append values, e.g. as in
+    ``envvar += ":" + encoded_pair`` which would be valid even if envvar is
+    originally empty.
+
+.. [2] This is to "fail early" in the cases that a naive producer does not
+    encode characters like ``=`` but the build path or target path does
+    actually contain them.
+
+
 References
 ==========
 

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/build-path-prefix-map-spec.git



More information about the Reproducible-commits mailing list