licensechack: new BSD detection algorithm
Benjamin Drung
bdrung at debian.org
Fri Oct 19 00:00:33 UTC 2012
Am Freitag, den 12.10.2012, 15:36 +1100 schrieb Dmitry Smirnov:
> Hello team,
>
> As you may know licensecheck have problems with detection of BSD licenses.
>
> Basically it uses regex to search for known clauses.
> This is wrong -- imagine situation when someone adds the following
>
> "Software must not be sold without prior written permission."
>
> to the classic BSD-2-clause license.
> Licensecheck then will find 2 unmodified clauses and incorrectly report
> license as BSD-2-clause (using "licensecheck" notation).
>
> I think I have an elegant solution to this: capture everything between first
> paragraph and the disclaimer to the variable and remove known clauses. If
> something left, report as "BSD-N-clause (modified)".
>
> The major improvement of this will be that detected "BSD-N-clause" will be
> guaranteed to be an unmodified license.
>
> The proposed implementation may be introduced by the attached patch
> that meant to be applied to "jessie" over my previous patches + attached
> 0001[..].patch adding another case to GPL detection.
> (I didn't try applying new BSD patch to the current state of "jessie").
>
> Also you can try attached licensecheck where this is already implemented.
> (worth trying with "--tests" argument. There will be some noise from unrelated
> failed test cases)
>
> I put new BSD detection algorithm before old ones so if new code recognises
> the license it removes its text from further processing, otherwise falls back
> gracefully.
>
> Also new detection introduces unified loop for license detection as I don't
> quite like sequence of if-elses that old code uses for detection.
>
> Feedback is very welcome. (Benjamin? Adam? James?)
I like the idea of trying to match the complete license. Changing one
word (or a few more) in a license can create a new license or make it
non-free. Therefore we should not generate false postives.
--
Benjamin Drung
Debian & Ubuntu Developer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.alioth.debian.org/pipermail/devscripts-devel/attachments/20121019/482c69e0/attachment.pgp>
More information about the devscripts-devel
mailing list