[Reproducible-commits] [debbindiff] 01/01: Skip full comparison when small files match
Jérémy Bobbio
lunar at moszumanska.debian.org
Wed Jan 7 10:58:12 UTC 2015
This is an automated email from the git hooks/post-receive script.
lunar pushed a commit to branch master
in repository debbindiff.
commit 64b76e73d7236a9f7227b7a30320c9717c85b1fe
Author: Jérémy Bobbio <lunar at debian.org>
Date: Wed Jan 7 11:54:11 2015 +0100
Skip full comparison when small files match
Using magic and doing smart comparisons is slower than comparing bytes
directly. Let's assume most files will actually be identical and start
comparing small files directly. If they match, we can skip the smart
bits entirely.
This lead to drastic run time improvements when comparing source tarballs
with many small files (e.g. linux-source).
---
debbindiff/comparators/__init__.py | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/debbindiff/comparators/__init__.py b/debbindiff/comparators/__init__.py
index 955d8dd..4997a7d 100644
--- a/debbindiff/comparators/__init__.py
+++ b/debbindiff/comparators/__init__.py
@@ -81,6 +81,8 @@ COMPARATORS = [
(None, r'\.a$', compare_static_lib_files),
]
+SMALL_FILE_THRESHOLD = 65536 # 64 kiB
+
def compare_files(path1, path2, source=None):
if not os.path.isfile(path1):
@@ -89,6 +91,13 @@ def compare_files(path1, path2, source=None):
if not os.path.isfile(path2):
logger.critical("%s is not a file" % path2)
sys.exit(2)
+ # try comparing small files directly first
+ size1 = os.path.getsize(path1)
+ size2 = os.path.getsize(path2)
+ if size1 == size2 and size1 <= SMALL_FILE_THRESHOLD:
+ if file(path1).read() == file(path2).read():
+ return []
+ # ok, let's do the full thing
for mime_type_regex, filename_regex, comparator in COMPARATORS:
if filename_regex and re.search(filename_regex, path1) \
and re.search(filename_regex, path2):
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/debbindiff.git
More information about the Reproducible-commits
mailing list