Bug#794282: devscripts: licensecheck should skip only binary files (i.e. include e.g. Postscript)

Jonas Smedegaard dr at jones.dk
Fri Jul 31 20:41:26 UTC 2015


Package: devscripts
Version: 2.15.6
Severity: important

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Since recently, licensecheck accepts only text/* or application/xml.

Only reason for that check is for sanity-cehcking a function to convert
strings to utf-8.

Converting most possible to utf-8 is a progression, but giving up on
other files is a regression: If a user asks to scan a (seemingly) binary
file then that is what the user wants to do.

Concretely, recent licensecheck no longer scans postscript files or
dumps of ICC metadata in the ghostscript package that could be converted
to utf-8 but does not match the MIME white-list, and Postscript files
containing embedded binary parts and ancient C source files containing
an 0x00F char - seemingly an earlier broken conversion of swedish Ö in
surname of a copyright holder (scandinavian Ö and Ø commonly gets messed
up when wrongly treating latin1 as some Windoze charset).

To convert most possible to utf-8,  MIME-based white-listing should be
changed to encoding-based black-listing, e.g. change line 326 from this:

    if ($mime =~ m!(?:text/[\w-]+|application/xml); charset=([\w-]+)!) {

to this:

    if ($mime =~ m/; charset=((?!binary)(?!unknown)[\w-]+)/) {

To not skip other files, replace line 332 from this:

	next;

to this:

	$charset = 'maybe-binary';

and replace line 240 from this:

        my $data = decode($charset,$_);

to this:

        my $data = $_;
        $data = decode($charset,$data) unless ( $charset eq 'maybe-binary' );

(the latter part coud be improved e.g. by running through the command
"strings").

Since this is a regression from earlier releases, I have bumped severity
to important.

 - Jonas

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJVu91wAAoJECx8MUbBoAEhhSQQAIGdyAIXw26qaLIdLSDWYSWA
8/mEkMOGUG15oatdL8nuW5mFSzToOrVVJggLrHTRYBE7Hy55OajjW5RIK2v0d30S
RM7syI6gViYRAgMdx/ldahEQ4pjpVYNkorO7nOlBIEezdAR+iD4GcNcyKFy9QALG
ZxZ3fR/onf9Cw9VK3LF03hski47Indf9b9Mbi3latm3hvWFDakKzb+Desc/Zgndg
egzBXnOnKoynNPPRNYyQ0PuuZBszw9W1g66j4MDEhTJjXeZTxTVJqbHyyzTW1ael
jSxOThBaS5QvDBQWyXO2mYLbPW55cyYhP+jJSp0QLXwzRA+e0UKAARBOC1hZ8FjV
MIvcXbcyCKIQCOAdOAEeKKz684HTh8y+WkR9f12tA9jcLj09FEDio1RIYZzfELwQ
OHG0MfzuWYz24V1pFtIoEvqIgINUc3tLzWBso0U7vEfSronj51wXh1ov5PjXDZv4
0uiukJJ0CAScWDaVTDB4WnUpdA3sJu0J+gXrdrByj7vClcQSvFwBgeugauNJgvPR
zUOETLLn1thlNtyJbrgIS4p/ntvdQicGGW066yYVGpno6+kJCYHOjZEwNbNJ2jJ3
jyovbzk4bdA/1sRtNVvR/tgrrgjxWT6RoaPKhDVr2zEkYLVBJQJgMWHpRNJhxDmR
S7knKMavS1W4SEIJ2tlq
=6qFM
-----END PGP SIGNATURE-----



More information about the devscripts-devel mailing list