licensechack: new BSD detection algorithm

Benjamin Drung bdrung at debian.org
Fri Oct 19 00:00:33 UTC 2012


Am Freitag, den 12.10.2012, 15:36 +1100 schrieb Dmitry Smirnov:
> Hello team,
> 
> As you may know licensecheck have problems with detection of BSD licenses.
> 
> Basically it uses regex to search for known clauses. 
> This is wrong -- imagine situation when someone adds the following 
> 
> 	"Software must not be sold without prior written permission."
> 
> to the classic BSD-2-clause license. 
> Licensecheck then will find 2 unmodified clauses and incorrectly report 
> license as BSD-2-clause (using "licensecheck" notation).
> 
> I think I have an elegant solution to this: capture everything between first 
> paragraph and the disclaimer to the variable and remove known clauses. If 
> something left, report as "BSD-N-clause (modified)".
> 
> The major improvement of this will be that detected "BSD-N-clause" will be 
> guaranteed to be an unmodified license.
> 
> The proposed implementation may be introduced by the attached patch 
> that meant to be applied to "jessie" over my previous patches + attached 
> 0001[..].patch adding another case to GPL detection.
> (I didn't try applying new BSD patch to the current state of "jessie").
> 
> Also you can try attached licensecheck where this is already implemented.
> (worth trying with "--tests" argument. There will be some noise from unrelated 
> failed test cases)
> 
> I put new BSD detection algorithm before old ones so if new code recognises 
> the license it removes its text from further processing, otherwise falls back 
> gracefully.
> 
> Also new detection introduces unified loop for license detection as I don't 
> quite like sequence of if-elses that old code uses for detection.
> 
> Feedback is very welcome. (Benjamin? Adam? James?)

I like the idea of trying to match the complete license. Changing one
word (or a few more) in a license can create a new license or make it
non-free. Therefore we should not generate false postives.

-- 
Benjamin Drung
Debian & Ubuntu Developer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.alioth.debian.org/pipermail/devscripts-devel/attachments/20121019/482c69e0/attachment.pgp>


More information about the devscripts-devel mailing list