[Reproducible-commits] [diffoscope] 12/13: Switch to incremental interface for TLSH

Jérémy Bobbio lunar at moszumanska.debian.org
Thu Oct 15 16:04:35 UTC 2015


This is an automated email from the git hooks/post-receive script.

lunar pushed a commit to branch master
in repository diffoscope.

commit 2051c862f84ec07342835af1d5136708c4bc48b2
Author: Jérémy Bobbio <lunar at debian.org>
Date:   Thu Oct 15 14:42:25 2015 +0000

    Switch to incremental interface for TLSH
    
    This means we won't load 800 MiB or more in memory to compute the fuzzy hash!
---
 debian/control                   | 2 +-
 debian/pydist-overrides          | 1 +
 diffoscope/comparators/binary.py | 7 ++++++-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/debian/control b/debian/control
index a8ae4f0..49e2df1 100644
--- a/debian/control
+++ b/debian/control
@@ -15,7 +15,7 @@ Build-Depends: debhelper (>= 9),
                python3-pytest,
                python3-rpm,
                python3-setuptools,
-               python3-tlsh,
+               python3-tlsh (>= 3.4.1),
                rpm-common
 Standards-Version: 3.9.6
 Homepage: http://diffoscope.org/
diff --git a/debian/pydist-overrides b/debian/pydist-overrides
index f456d28..74a61e8 100644
--- a/debian/pydist-overrides
+++ b/debian/pydist-overrides
@@ -1,2 +1,3 @@
 magic python-magic
 rpm python-rpm
+tlsh python-tlsh (>= 3.4.1)
diff --git a/diffoscope/comparators/binary.py b/diffoscope/comparators/binary.py
index 52defb0..1f14e79 100644
--- a/diffoscope/comparators/binary.py
+++ b/diffoscope/comparators/binary.py
@@ -118,7 +118,12 @@ class File(object, metaclass=ABCMeta):
             with self.get_content():
                 # tlsh is not meaningful with files smaller than 512 bytes
                 if os.stat(self.path).st_size >= 512:
-                    self._fuzzy_hash = tlsh.hash(open(self.path, 'rb').read())
+                    h = tlsh.Tlsh()
+                    with open(self.path, 'rb') as f:
+                        for buf in iter(lambda: f.read(32768), b''):
+                            h.update(buf)
+                    h.final()
+                    self._fuzzy_hash = h.hexdigest()
                 else:
                     self._fuzzy_hash = None
         return self._fuzzy_hash

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/diffoscope.git



More information about the Reproducible-commits mailing list