[Python-apps-commits] r6139 - in packages/ocrodjvu/branches/0.4.6/debian (3 files)

jwilk at users.alioth.debian.org jwilk at users.alioth.debian.org
Sun Sep 26 19:03:48 UTC 2010


    Date: Sunday, September 26, 2010 @ 19:03:37
  Author: jwilk
Revision: 6139

Fix crash on hOCR with image elements.

Added:
  packages/ocrodjvu/branches/0.4.6/debian/patches/hocr-no-bbox.diff
Modified:
  packages/ocrodjvu/branches/0.4.6/debian/changelog
  packages/ocrodjvu/branches/0.4.6/debian/patches/series

Modified: packages/ocrodjvu/branches/0.4.6/debian/changelog
===================================================================
--- packages/ocrodjvu/branches/0.4.6/debian/changelog	2010-09-26 19:03:07 UTC (rev 6138)
+++ packages/ocrodjvu/branches/0.4.6/debian/changelog	2010-09-26 19:03:37 UTC (rev 6139)
@@ -3,8 +3,9 @@
   * Fix URL in changelog-0.4.6.diff.
   * Preserve environment variables (except LC_*, LANG and LANGUAGE) when
     calling external programs (closes: #594385). [preserve-environment.diff]
+  * Fix crash on hOCR with image elements. [hocr-no-bbox.diff]
 
- -- Jakub Wilk <jwilk at debian.org>  Sun, 26 Sep 2010 16:47:48 +0200
+ -- Jakub Wilk <jwilk at debian.org>  Sun, 26 Sep 2010 21:01:50 +0200
 
 ocrodjvu (0.4.6-1) unstable; urgency=low
 

Added: packages/ocrodjvu/branches/0.4.6/debian/patches/hocr-no-bbox.diff
===================================================================
--- packages/ocrodjvu/branches/0.4.6/debian/patches/hocr-no-bbox.diff	                        (rev 0)
+++ packages/ocrodjvu/branches/0.4.6/debian/patches/hocr-no-bbox.diff	2010-09-26 19:03:37 UTC (rev 6139)
@@ -0,0 +1,26 @@
+Description: Fix crash on hOCR with image elements.
+Origin: upstream, http://bitbucket.org/jwilk/ocrodjvu/changeset/e109b01b8455
+Last-Update: 2010-09-26
+
+--- a/lib/hocr.py
++++ b/lib/hocr.py
+@@ -422,7 +422,18 @@
+         result[:] = _replace_text(djvu_class, title, ''.join(result), settings)
+     elif settings.cuneiform and settings.cuneiform <= (0, 8) and djvu_class is const.TEXT_ZONE_PARAGRAPH:
+         result[:] = _replace_cuneiform08_paragraph(result[:], settings)
+-    if not bbox and not len(node):
++    if not bbox:
++        if len(node) == 0:
++            # Ocropus 0.2 does't always provide necessary bounding box
++            # information. We have no other choice than to drop such a broken
++            # zone silently.
++            # FIXME: This work-around is ugly and should be dropped at some point.
++            return
++        # If a bbox is undetermined, it's either because of:
++        # - malformed hOCR (but that should be noticed earlier/later), or
++        # - a zone with no children (which we're skipping here).
++        # We skip the zone even if the HTML element is not empty, i.e. len(node) > 0.
++        assert len(result) == 0
+         return
+     if settings.page_size is None:
+         raise errors.MalformedHocr('unable to determine page size')

Modified: packages/ocrodjvu/branches/0.4.6/debian/patches/series
===================================================================
--- packages/ocrodjvu/branches/0.4.6/debian/patches/series	2010-09-26 19:03:07 UTC (rev 6138)
+++ packages/ocrodjvu/branches/0.4.6/debian/patches/series	2010-09-26 19:03:37 UTC (rev 6139)
@@ -1,3 +1,4 @@
 changelog-0.4.6.diff
 tests-version.diff
 preserve-environment.diff
+hocr-no-bbox.diff




More information about the Python-apps-commits mailing list