[devscripts] 03/03: licensecheck: don't die when --encoding is wrong
Osamu Aoki
osamu at debian.org
Sat Mar 26 16:54:46 UTC 2016
Hi,
On Fri, Mar 25, 2016 at 01:17:44PM +0000, dod at debian.org wrote:
> This is an automated email from the git hooks/post-receive script.
>
> dod pushed a commit to branch master
> in repository devscripts.
>
> commit 2dbe747c5af80a2690d351a0e1d978096ee5b18d
> Author: Dominique Dumont <dod at debian.org>
> Date: Fri Mar 25 14:09:33 2016 +0100
>
> licensecheck: don't die when --encoding is wrong
Yah, ... that is good.
> Turns out that licensecheck died with --encoding utf8 option was used
> to read a latin1 file (or any ISO encoding) with this error:
> utf8 "\xFC" does not map to Unicode at /usr/bin/licensecheck line 415
>
> This behavior breaks "cme update dpkg-copyright" when some files are
> encoded in latin1 (cme always use --encoding utf8 option).
>
> When --encoding is used, licensecheck will now attempt to read the file
> with latin1 and then binary encoding.
But does latin1 ever fail? Guessing encoding is black magic ..:-)
FYI: For my debmake internal file reader, I read files with
* read as UTF-8
* If error, read as latin1
* substute all with '[\s!-~]' match to get all non-ascii
* if len(non_ascii) > 25% * len(all_line),
* then go on as binary
* otherwise, go on as latin1
* If success, go on as UTF-8
I used to identify binary file to be file with \000 but the above seems
to be better.
More information about the devscripts-devel
mailing list