[Po4a-devel] Bug#480997: pod2man turns UTF8 into "X"s

Nicolas François nicolas.francois at centraliens.net
Sun Jun 1 15:28:29 UTC 2008


tags 480997 patch
thanks

On Mon, May 19, 2008 at 06:12:57PM -0700, rra at debian.org wrote:
> 
> Basically, the end result is that this is not a bug that I can fix without
> doing work that I'm not sure I have time to do.  I would certainly welcome
> patches to teach it to (optionally) output UTF-8 directly and just assume
> that the resulting device can cope; it should be a command-line option to
> start with.  This may not be too bad, and if I find time, I can try to see
> what I can do for the next release, but no promises there.

In fact it is not that far from being implemented.
What is missing is an option to modify the "convert" formatting option of
Pod::Man from pod2man.

Here is a patch to do that.
It adds a "convert" option to Pod::Man (set by default), and a --noascii
option to pod2man, which permits to disable the Pod::Man's convert option.

Best Regards,
-- 
Nekral
-------------- next part --------------
--- ./lib/Pod/Man.pm	2007-12-18 11:47:07.000000000 +0100
+++ /usr/share/perl/5.10.0/Pod/Man.pm	2008-06-01 16:51:18.386959043 +0200
@@ -198,6 +198,8 @@
         unless defined $$self{release};
     $$self{indent} = 4
         unless defined $$self{indent};
+    $$self{convert} = 1
+        unless defined $$self{convert};
 
     # Double quotes in things that will be quoted.
     for (qw/center release/) {
@@ -320,9 +323,9 @@
     my ($self, $current, $element) = @_;
     my %options;
     if ($current) {
         %options = %$current;
     } else {
-        %options = (guesswork => 1, cleanup => 1, convert => 1);
+        %options = (guesswork => 1, cleanup => 1, convert => $$self{convert});
     }
     if ($element eq 'Data') {
         $options{guesswork} = 0;
@@ -1582,6 +1590,16 @@
 By default, section 1 will be used unless the file ends in .pm in which case
 section 3 will be selected.
 
+=item convert
+
+Convert the non-ASCII characters to their *roff equivalents.
+By default (or when set to 1) these characters are converted because some
+vendor *roff implementations can?t handle eight-bit data.
+
+Another approach to support non-ASCII would be to map EE<lt>E<gt> escapes
+to the appropriate UTF-8 characters and then do a translation pass on the
+output according to the user-specified output character set.
+
 =back
 
 The standard Pod::Simple method parse_file() takes one argument naming the
@@ -1617,15 +1635,6 @@
 
 =head1 BUGS
 
-Eight-bit input data isn't handled at all well at present.  The correct
-approach would be to map EE<lt>E<gt> escapes to the appropriate UTF-8
-characters and then do a translation pass on the output according to the
-user-specified output character set.  Unfortunately, we can't send eight-bit
-data directly to the output unless the user says this is okay, since some
-vendor *roff implementations can't handle eight-bit data.  If the *roff
-implementation can, however, that's far superior to the current hacked
-characters that only work under troff.
-
 There is currently no way to turn off the guesswork that tries to format
 unmarked text appropriately, and sometimes it isn't wanted (particularly
 when using POD to document something other than Perl).  Most of the work
--- pod/pod2man.PL	2007-12-18 11:47:08.000000000 +0100
+++ /usr/bin/pod2man	2008-06-01 16:51:21.618958990 +0200
@@ -66,7 +33,7 @@
 GetOptions (\%options, 'section|s=s', 'release|r:s', 'center|c=s',
             'date|d=s', 'fixed=s', 'fixedbold=s', 'fixeditalic=s',
             'fixedbolditalic=s', 'name|n=s', 'official|o', 'quotes|q=s',
-            'lax|l', 'help|h', 'verbose|v') or exit 1;
+            'lax|l', 'help|h', 'verbose|v', 'nonascii') or exit 1;
 pod2usage (0) if $options{help};
 
 # Official sets --center, but don't override things explicitly set.
@@ -82,6 +49,12 @@
 # compatibility.
 delete $options{lax};
 
+# Ask Pod::Man to avoid converting characters
+if ($options{nonascii}) {
+    $options{convert} = 0;
+    delete $options{nonascii};
+}
+
 # Initialize and run the formatter, pulling a pair of input and output off at
 # a time.
 my $parser = Pod::Man->new (%options);
@@ -104,7 +77,7 @@
 [B<--center>=I<string>] [B<--date>=I<string>] [B<--fixed>=I<font>]
 [B<--fixedbold>=I<font>] [B<--fixeditalic>=I<font>]
 [B<--fixedbolditalic>=I<font>] [B<--name>=I<name>] [B<--official>]
-[B<--lax>] [B<--quotes>=I<quotes>] [B<--verbose>]
+[B<--lax>] [B<--quotes>=I<quotes>] [B<--verbose>] [B<--nonascii>]
 [I<input> [I<output>] ...]
 
 pod2man B<--help>
@@ -203,6 +176,15 @@
 files at once.  The convention for Unix man pages for commands is for the
 man page title to be in all-uppercase even if the command isn't.
 
+=item B<--nonascii>
+
+Do not convert non-ASCII characters to their *roff equivalents.
+By default, these characters are converted.
+
+Some vendor *roff implementations can?t handle eight-bit data.
+You should make sure that the targeted *roff implementations support well
+eight-bit input data.
+
 =item B<-o>, B<--official>
 
 Set the default header to indicate that this page is part of the standard


More information about the Po4a-devel mailing list