[devscripts] 03/03: licensecheck: don't die when --encoding is wrong

Osamu Aoki osamu at debian.org
Sat Mar 26 16:54:46 UTC 2016


Hi,

On Fri, Mar 25, 2016 at 01:17:44PM +0000, dod at debian.org wrote:
> This is an automated email from the git hooks/post-receive script.
> 
> dod pushed a commit to branch master
> in repository devscripts.
> 
> commit 2dbe747c5af80a2690d351a0e1d978096ee5b18d
> Author: Dominique Dumont <dod at debian.org>
> Date:   Fri Mar 25 14:09:33 2016 +0100
> 
>     licensecheck: don't die when --encoding is wrong

Yah, ... that is good.
   
>     Turns out that licensecheck died with --encoding utf8 option was used
>     to read a latin1 file (or any ISO encoding) with this error:
>     utf8 "\xFC" does not map to Unicode at /usr/bin/licensecheck line 415
>     
>     This behavior breaks "cme update dpkg-copyright" when some files are
>     encoded in latin1 (cme always use --encoding utf8 option).
>     
>     When --encoding is used, licensecheck will now attempt to read the file
>     with latin1 and then binary encoding.

But does latin1 ever fail?  Guessing encoding is black magic ..:-)

FYI: For my debmake internal file reader, I read files with
 * read as UTF-8
   * If error, read as latin1
     * substute all with '[\s!-~]' match to get all non-ascii
       * if len(non_ascii) > 25% * len(all_line),
          * then go on as binary
          * otherwise, go on as latin1
   * If success, go on as UTF-8

I used to identify binary file to be file with \000 but the above seems
to be better.



More information about the devscripts-devel mailing list