[Python-apps-commits] r6139 - in packages/ocrodjvu/branches/0.4.6/debian (3 files)
jwilk at users.alioth.debian.org
jwilk at users.alioth.debian.org
Sun Sep 26 19:03:48 UTC 2010
Date: Sunday, September 26, 2010 @ 19:03:37
Author: jwilk
Revision: 6139
Fix crash on hOCR with image elements.
Added:
packages/ocrodjvu/branches/0.4.6/debian/patches/hocr-no-bbox.diff
Modified:
packages/ocrodjvu/branches/0.4.6/debian/changelog
packages/ocrodjvu/branches/0.4.6/debian/patches/series
Modified: packages/ocrodjvu/branches/0.4.6/debian/changelog
===================================================================
--- packages/ocrodjvu/branches/0.4.6/debian/changelog 2010-09-26 19:03:07 UTC (rev 6138)
+++ packages/ocrodjvu/branches/0.4.6/debian/changelog 2010-09-26 19:03:37 UTC (rev 6139)
@@ -3,8 +3,9 @@
* Fix URL in changelog-0.4.6.diff.
* Preserve environment variables (except LC_*, LANG and LANGUAGE) when
calling external programs (closes: #594385). [preserve-environment.diff]
+ * Fix crash on hOCR with image elements. [hocr-no-bbox.diff]
- -- Jakub Wilk <jwilk at debian.org> Sun, 26 Sep 2010 16:47:48 +0200
+ -- Jakub Wilk <jwilk at debian.org> Sun, 26 Sep 2010 21:01:50 +0200
ocrodjvu (0.4.6-1) unstable; urgency=low
Added: packages/ocrodjvu/branches/0.4.6/debian/patches/hocr-no-bbox.diff
===================================================================
--- packages/ocrodjvu/branches/0.4.6/debian/patches/hocr-no-bbox.diff (rev 0)
+++ packages/ocrodjvu/branches/0.4.6/debian/patches/hocr-no-bbox.diff 2010-09-26 19:03:37 UTC (rev 6139)
@@ -0,0 +1,26 @@
+Description: Fix crash on hOCR with image elements.
+Origin: upstream, http://bitbucket.org/jwilk/ocrodjvu/changeset/e109b01b8455
+Last-Update: 2010-09-26
+
+--- a/lib/hocr.py
++++ b/lib/hocr.py
+@@ -422,7 +422,18 @@
+ result[:] = _replace_text(djvu_class, title, ''.join(result), settings)
+ elif settings.cuneiform and settings.cuneiform <= (0, 8) and djvu_class is const.TEXT_ZONE_PARAGRAPH:
+ result[:] = _replace_cuneiform08_paragraph(result[:], settings)
+- if not bbox and not len(node):
++ if not bbox:
++ if len(node) == 0:
++ # Ocropus 0.2 does't always provide necessary bounding box
++ # information. We have no other choice than to drop such a broken
++ # zone silently.
++ # FIXME: This work-around is ugly and should be dropped at some point.
++ return
++ # If a bbox is undetermined, it's either because of:
++ # - malformed hOCR (but that should be noticed earlier/later), or
++ # - a zone with no children (which we're skipping here).
++ # We skip the zone even if the HTML element is not empty, i.e. len(node) > 0.
++ assert len(result) == 0
+ return
+ if settings.page_size is None:
+ raise errors.MalformedHocr('unable to determine page size')
Modified: packages/ocrodjvu/branches/0.4.6/debian/patches/series
===================================================================
--- packages/ocrodjvu/branches/0.4.6/debian/patches/series 2010-09-26 19:03:07 UTC (rev 6138)
+++ packages/ocrodjvu/branches/0.4.6/debian/patches/series 2010-09-26 19:03:37 UTC (rev 6139)
@@ -1,3 +1,4 @@
changelog-0.4.6.diff
tests-version.diff
preserve-environment.diff
+hocr-no-bbox.diff
More information about the Python-apps-commits
mailing list