[Reproducible-commits] [debbindiff] 01/01: Skip full comparison when small files match

Jérémy Bobbio lunar at moszumanska.debian.org
Wed Jan 7 10:58:12 UTC 2015


This is an automated email from the git hooks/post-receive script.

lunar pushed a commit to branch master
in repository debbindiff.

commit 64b76e73d7236a9f7227b7a30320c9717c85b1fe
Author: Jérémy Bobbio <lunar at debian.org>
Date:   Wed Jan 7 11:54:11 2015 +0100

    Skip full comparison when small files match
    
    Using magic and doing smart comparisons is slower than comparing bytes
    directly. Let's assume most files will actually be identical and start
    comparing small files directly. If they match, we can skip the smart
    bits entirely.
    
    This lead to drastic run time improvements when comparing source tarballs
    with many small files (e.g. linux-source).
---
 debbindiff/comparators/__init__.py | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/debbindiff/comparators/__init__.py b/debbindiff/comparators/__init__.py
index 955d8dd..4997a7d 100644
--- a/debbindiff/comparators/__init__.py
+++ b/debbindiff/comparators/__init__.py
@@ -81,6 +81,8 @@ COMPARATORS = [
     (None, r'\.a$', compare_static_lib_files),
     ]
 
+SMALL_FILE_THRESHOLD = 65536 # 64 kiB
+
 
 def compare_files(path1, path2, source=None):
     if not os.path.isfile(path1):
@@ -89,6 +91,13 @@ def compare_files(path1, path2, source=None):
     if not os.path.isfile(path2):
         logger.critical("%s is not a file" % path2)
         sys.exit(2)
+    # try comparing small files directly first
+    size1 = os.path.getsize(path1)
+    size2 = os.path.getsize(path2)
+    if size1 == size2 and size1 <= SMALL_FILE_THRESHOLD:
+        if file(path1).read() == file(path2).read():
+            return []
+    # ok, let's do the full thing
     for mime_type_regex, filename_regex, comparator in COMPARATORS:
         if filename_regex and re.search(filename_regex, path1) \
            and re.search(filename_regex, path2):

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/debbindiff.git



More information about the Reproducible-commits mailing list