[Reproducible-commits] [diffoscope] 01/01: Make comparison of zip archives with utf-8 file names more robust

Jérémy Bobbio lunar at moszumanska.debian.org
Wed Nov 18 11:53:02 UTC 2015


This is an automated email from the git hooks/post-receive script.

lunar pushed a commit to branch master
in repository diffoscope.

commit 7367033265affcddaa20dda39b65e15028ea477e
Author: Jérémy Bobbio <lunar at debian.org>
Date:   Wed Nov 18 11:55:40 2015 +0100

    Make comparison of zip archives with utf-8 file names more robust
    
    On systems where the filesystem encoding is not able to represent unicode
    strings (e.g. when `LC_ALL=C`) simply using ZipFile.extract() will crash
    trying to encode the target path.
    
    So let's inline the core operation (a pair of open and copy from source to
    destination) and make up the target path replacing any weird characters.
    
    Closes: #805418
---
 diffoscope/comparators/zip.py | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/diffoscope/comparators/zip.py b/diffoscope/comparators/zip.py
index 377329f..dd7b915 100644
--- a/diffoscope/comparators/zip.py
+++ b/diffoscope/comparators/zip.py
@@ -17,7 +17,11 @@
 # You should have received a copy of the GNU General Public License
 # along with diffoscope.  If not, see <http://www.gnu.org/licenses/>.
 
+from contextlib import contextmanager
+import os.path
 import re
+import shutil
+import sys
 import zipfile
 from diffoscope.difference import Difference
 from diffoscope import tool_required
@@ -54,6 +58,10 @@ class ZipDirectory(Directory, ArchiveMember):
     def has_same_content_as(self, other):
         return False
 
+    @contextmanager
+    def get_content(self):
+        yield
+
     def is_directory(self):
         return True
 
@@ -75,7 +83,13 @@ class ZipContainer(Archive):
         return self.archive.namelist()
 
     def extract(self, member_name, dest_dir):
-        return self.archive.extract(member_name, dest_dir)
+        # We don't really want to crash if the filename in the zip archive
+        # can't be encoded using the filesystem encoding. So let's replace
+        # any weird character so we can get to the bytes.
+        targetpath = os.path.join(dest_dir, os.path.basename(member_name)).encode(sys.getfilesystemencoding(), errors='replace')
+        with self.archive.open(member_name) as source, open(targetpath, 'wb') as target:
+            shutil.copyfileobj(source, target)
+        return targetpath
 
     def get_member(self, member_name):
         zipinfo = self.archive.getinfo(member_name)

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/reproducible/diffoscope.git



More information about the Reproducible-commits mailing list