[Build-common-hackers] licensecheck and binary blobs

IOhannes zmölnig zmoelnig at iem.at
Fri Aug 19 14:22:34 UTC 2011

On 08/19/2011 03:57 PM, Jonas Smedegaard wrote:
> On 11-08-19 at 03:34pm, IOhannes zmölnig wrote:
> [fine and accurate problem description snipped]
>> to solve the problem on my side, i could either create a longish (and 
>> complicated and hard to maintain) DEB_COPYRIGHT_CHECK_IGNORE_REGEX, or 
>> add debian/copyright_hints to debian/source/include-binaries
>> personally i think both ideas cruft, as they only try to work around 
>> the unexpectedly binary output of licensecheck. (and is a repetitive 
>> work that should be taken care of by cdbs)
>> instead i would suggest to sanitize the output of licensecheck, in 
>> order to keep debian/copyright_(new)?hints proper text files.
> I tried that in the past, but then non-ascii names of copyright holders 
> got crippled or even stripped completely.

i see.

> I see no way really to make licensecheck a fully automated routine, 
> because is fundamentally is dealing with crufty data.

however, i don't see a real need that everything is fully automated.
as i see it, licensecheck is only there to help the packagers in their
task of getting the license information right, rather than doing the
work for them.
e.g. licensecheck will generate a _template_ debian/copyright_newhints,
that the packager can then use to create a proper and nice debian/copyright

if licensecheck fills the "Author" field with seemingly meaningless
(ascii) characters of leave it alone, then this is annoying.
but its is equally annoying, if it fills the "License" field with binary
data: in both cases the packager has to figure out what was actually
meant, by inspecting the source file.

if we can agree that the work is the same, then it makes more sense to
try to avoid binary blobs in debian/copyright_hints, as binary data in
there will break dpkg-source as a side-effect!

> Would be cool to be able to add (not replace) an ignore regex.  I just 
> haven't figured out a sensible way to do that yet.
> For now I recommend to use the clumsy approach of overriding 

even if it was to be done easily (e.g. by adding to the regexp rather
than replacing), the problem remains that you are deliberately not
looking at certain files.
e.g. if a file generates binary blobs for the "License" information, but
the "Author" field is fine, and then the author of the file changes,
this will go unnoticed!

and then, people are lazy.
e.g. if there is a project with about 120 png files, all of them
triggering a false positive, it is likely that the packager will simply
add "*\.png" to the regexp.
any newly added .png file will then go unnoticed!

> (the other approach is too invasive in 
> my opinion: affects other things than just the copyright-check routine).

could you elaborate on that a bit?
i just decided to go for that route in a package, but if it indeed may
cause problems, i will switch back to (sigh) the regexp...


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.alioth.debian.org/pipermail/build-common-hackers/attachments/20110819/040eaef4/attachment.pgp>

More information about the Build-common-hackers mailing list