[Pkg-xfce-devel] Bug#654468: Bug#645191: update on waf binary data

Sun Apr 15 18:32:34 UTC 2012

* Yves-Alexis Perez [2012-04-15 09:18 +0200]:
> On ven., 2012-03-23 at 23:39 +0100, Carsten Hey wrote:
> > I think we should drop ftpmaster from CC in further mails.
>
> Maybe, since they don't seem to care about this.

They provided an IMHO acceptable, but not ideal, way (because there
there does not seem to be an ideal way) to handle this.  I suggested
dropping them from CC because there is nothing relevant yet they could
comment on, which presumably is also the reason they did not comment up
to now ;)

> Well, parsing python might not be an option, but what about:
>
> egrep -a "^C[1|2]='..'" waf
> C1='#*'
> C2='#%'

We need to be able to repack a changed wafadmin directory into an
existing waf script to gain anything.  To repack, C1 and C2 need to be
adapted.  If adapting C1 and C2 is done via regular expressions, it
would fail, possibly without being noticed, if, for example, the
variable names in future waf versions change or if the character ' is
part of this variable and you did not handle this in your regular
expression.  All in all, this is a rather natural approach for this
problem, but it is all but robust.

It could be done using regular expressions, but I assume that the effort
required to ensure that it works correctly and to update it is way more
than the effort to just shipping an unpacked waf in every waf using
package.  Besides this, the probability of unnoticed related errors is
presumably unreasonably high.

A way to handle this that would possibly make everybody happy would
require to convince waf upstream to adapt waf.

As already mentioned, the reason that we are not able to repack waf
scripts in a reasonable way using only essential tools is that waf
scripts are not clearly divided into a data part and an non-data part,
i.e., C1 and C2 contain information that one would expect to be in
a header and not in a script.

If waf script's would instead of the variables C1 and C2 contain
a header like the one below, and would parse the header itself to figure
out which replacements it should do, then tools that unpack and/or
repack waf scripts in a reliable way could easily be written.

  #===
  # Waf-Data-Format: 1.0
  # Waf-Archive-Type: tar.gz
  # Waf-Archive-Base-Directory: wafadmin
  # Waf-Line-Feed-Replacement: ab
  # Waf-Carriage-Return-Replacement: xy
  #==>
  #...
  #<==

If such a header would be used by waf upstream, it would be important
that there is exactly one space between the colon after the field name
and the field's data.  The reason for this is that a replacement string
could begin with a space character.  Introducing a way to escape some
characters would IMO be too over-engineered.  Alternatively, the
(uppercased) hex values could be used instead of the real string, i.e.,
' m' would be written as 206D in the header.

Reasons to brute-force unused sequences instead of simply prefixing all
line feeds and all carriage returns with a numbersign are:

 * Kepp the size of the encoded string as small as possible.  Prefixing
   two of the possible 256 characters would enlarge the encoded string
   on average by 2/256 or 0.78%, given that the compression method is
   reasonable.

 * Some editors do not wrap lines by default.  One could consider
   displaying just one long unwrapped line instead of multiple lines (on
   average size/128 lines) if a waf script is opened in an editor to be
   more beautiful.

 * The data part ends before a line that only contains the string
   '#<=='.  If you would encode an archive of infinite size by the
   described prefixing, it would also contain this line _in_ the data
   part.  A way to fix this it to additionally prefix the equal sign
   with a number sign.  A presumably better way it to interpret the
   semantic of '#<==' as "the data part ends before the _last_ equal
   line in a comment block" and not "... before the _first_ equal line
   ...".

Perl one-liner filters to encode and decode the data part using the
described prefixing are:
    perl -e '$_ = do { local $/ = <> }; s/\n/\n#/sg; s/\r/\r#/sg; print "#", $_, "\n"'
    perl -e '$_ = do { local $/ = <> }; $_ = substr($_, 1, -1); s/\r#/\r/sg; s/\n#/\n/sg; print'

They can be used in the same way as all other filters:
    cat file | filter > result

With this approach, the need for C1 and C2 (or the according header
fields) would vanish.  The header would still be very useful, though.

The remaining non-trivial part, which I will not do since I think the
existing solution (shipping waf unpacked) is ugly but sufficient and
I don't even use waf, is to try to convince waf's upstream to add such
a header.  With such a header and the according scripts, changes between
different Debian revisions would still not be reviewable as easy as
running "zrun interdiff *.diff.gz", but I don't think that this is
a blocker, as long as README.source contains easy recipes for changing
waf and reviewing these changes.

> Well, when needed because we need to patch the build script (like for
> the hppa issue) we can do that.

Being able to do something doesn't necessarily mean that it can be done
in an easy way.

Regards
Carsten

P.S.: Do whatever you want to with this mail's content.  If anything
      in it I wrote (everything that is not quoted from your previous
      mail) is copyrightable, which I doubt, then it is licensed under
      terms of  the practically public domain equivalent license WTFPL
      2.0

P.P.S.: If you want to test if the above can be embedded into a python
        script, set the script's encoding to latin-1, as described in
        PEP 0263 - or just copy the second line of an existing waf
        script.