Bug#760817: ssdeep: wrong scoring on two fuzzy hashes with same block sizes
Tsukasa #01 (Oi)
li at livegrid.org
Mon Sep 8 06:13:30 UTC 2014
Package: ssdeep
Version: 2.7-2
Severity: important
Tags: patch
Dear Maintainer,
ssdeep (and libfuzzy2 Debian package) before version 2.10 has a bug
which may make wrong score on two fuzzy hashes with same block sizes.
This will make clustering/comparing files unreliable.
This bug was fixed in 2.10 by Jesse Kornblum
<research at jessekornblum.com> but still not fixed in Debian versions
(sid, unstable and stable).
I encountered this bug while clustering about 10M files based on ssdeep
hashes and I had to recluster all the files.
Sorry that I have no `natural' examples to reproduce (because I slightly
changed the parameter after building patched versions of
ssdeep/libfuzzy2 2.7-2 and it will take about 2 months * 20 CPU cores to
compare clusters) but we can generate `artificial' example by truncating
second chunk of fuzzy hashes.
[PROMPT_EXAMPLE_BEGIN]
$ # Generate artificial test cases
$ cat >test <<_END
ssdeep,1.1--blocksize:hash:hash,filename
24:5nmkHuww9FXe0ZpPKoVH7bK3KT1Odk8gKgNWvoqzDVEatXSHlY31x:E4uV9FX,"1"
24:5nmkHuww9FXe0ZpPKoVH7bK3KT1Odk8gKgNWvoqzDVENXSCYA1x:E4uV9FX,"2"
_END
$ # This is the expected result.
$ $SSDEEP_FIXED/ssdeep -k test -x test
test:1 matches test:2 (100)
test:1 matches test:2 (100)
test:2 matches test:1 (100)
test:2 matches test:1 (100)
test:1 matches test:2 (100)
test:1 matches test:2 (100)
test:2 matches test:1 (100)
test:2 matches test:1 (100)
$ # This is the result from Debian versions of ssdeep.
$ ssdeep -k test -x test
test:1 matches test:2 (94)
test:1 matches test:2 (94)
test:2 matches test:1 (94)
test:2 matches test:1 (94)
test:1 matches test:2 (94)
test:1 matches test:2 (94)
test:2 matches test:1 (94)
test:2 matches test:1 (94)
$
[PROMPT_EXAMPLE_END]
As you can see, buggy ssdeep/libfuzzy2 returns score of 94 but fixed
versions of ssdeep/libfuzzy2 returns score of 100 for cases:
* file 1 and file 2
* file 1 and file 1 (matching itself)
* file 2 and file 2 (matching itself)
Attached patch is excerpt from actual Jesse Kornblum's patch (applied in
ssdeep 2.10) formatted for Debian version of 2.7-2.
By the way, I recommend UPGRADING THE UPSTREAM VERSION TO 2.10 on
`unstable' and `sid' instead of applying the patch because ssdeep
version 2.10 fixes some other bugs (I didn't encountered but someone
other may).
Thanks and I hope this will be fixed before `Jessie' is frozen.
Tsukasa OI
http://a4lg.com/
-- System Information:
Debian Release: 7.6
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 3.2.0-4-amd64 (SMP w/40 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages ssdeep depends on:
ii libc6 2.13-38+deb7u4
ssdeep recommends no packages.
ssdeep suggests no packages.
-- no debconf information
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fuzzy-patch-2.10-by-Jesse-Kornblum.patch
Type: text/x-diff
Size: 452 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/forensics-devel/attachments/20140908/374c34cf/attachment.patch>
More information about the forensics-devel
mailing list