[Po4a-devel]Some patches for the Man module

Sun, 26 Sep 2004 18:03:54 +0200

--uAKRQypu60I7Lcqm
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Sep 24, 2004 at 01:28:23AM +0200, Martin Quinson wrote:
> On Wed, Sep 22, 2004 at 11:19:30PM +0200, Nicolas Fran=E7ois wrote:
> > I should have released these patches a few time ago, but Alioth had t=
roubles.
>=20
> Yeah, you really should. Now, you'll have to create an account on aliot=
h so
> that we can give you the cvs write access, so that you can commit them
> yourself...

Alioth is up and running again. Here is my account: nekral-guest.

> > formats regression tests statistics:
> > $ ./stats.sh orig work
> >         IGN    OK   OK2  WOK1  WOK2  WOK3   PBS WDIFF
[...]
> >   IGN  1698     0     0     0     0     0     0     0
>=20
> Please add "was \ is" (or the contrary ;) in the upper left cell of the=
 tab
> to help my understanding.

Done. (In fact dir1\dir2)

> > The different categories are:
> >   IGN   man pages po4a refused to operate on (e.g. wad generated by
> >         Pod::Man)
[...]
> >   WDIFF These are probably bugs in po4a or in the man page (I started
> >         reporting some of them in the BTS, which is another way of
> >         improving po4a statistics)
> >       =20
> > In the table above, it is usually an improvement to have big numbers =
on
> > the bottom left corner (with the exception of the IGN column).
>=20
> Please put this as comment in the script (at least, if not by default u=
nder
> the table).

Done. I added a Usage for both scripts.
The categories are documented in check, because stats.sh could be used
with any other regression test tool.

> Where is the WOK category (diff sees a difference, but not wdiff even b=
efore
> edition) ?

I tried not to have too much categories, because they increase the
execution time.
I propose to add more categories latter, when there will be less WDIFF.
At this time, I'm not sure I'm reading WOK1 and WOK2 as very different.

> > Here are the patches for the Man module:
> >=20
> >   + comments
> >     It recognize some (probably incorrect, but usual) comment lines.
> >     Here are the results of the regression tests for this patch:
>=20
> Question: the info groff part you cite in comment seems to imply that=20
> .B toto \" a comment
> is a valid construct, but this is not handled by the code. You should a=
t
> least add a comment stating so, shouldn't you?=20

Yes, it is valid. (At least it is recognized as such by my groff).

My problem was: Ok, I recognized a comment inside a paragraph, what
should I do with this comment ?
Can I show it in the po (and how?)? Is there any interest in doing this?
Can I just trash the comments?

> That being said, if this feature is not used by the existing pages, don=
't
> bother implementing (=3D> let's commit).

At this time, '.B toto \" a comment' will generate
'\fBtoto \" a comment\fR', which is buggy because the Bold font is
not terminated and \" may mask the end of the paragraph.

It is largely used for commenting .de, .if,... (but those macros are not
supported (and will probably never be).
I've tried a grep on my pages, but there was to much noise. However The
issue should have raise in the regression test, so I think it is not used
at this time.

Maybe a die could do the trick ("comments are not supported yet, please
contact ...")

> >   + nested_fonts
> >     It deals with the nested font issue.
> >     I have an idea on how to simplify it a lot, but I think it could =
be
> >     applied, because it is doing a good job.
> >     The only remaining issue is with "un-terminated" fonts, as in:
> >       Hello, my name is \fINicolas \fBFRAN=C7OIS
> >     IMHO, in groff, there is no nested font (with some exceptions, li=
ke
> >     SB, and some italic and bold faces, or by using exotic tmac).
> >     \fIfoo\fBbar\fR is equivalent to \fIfoo\fR\fBbar\fR (with the
> >     exception of the \fP).
>=20
> Mmm. That's a very tought problem. As you saw in the code, I already tr=
ied
> to tackle it, in vain so far. I must admit that the regression result y=
ou
> show are rather impressive.=20
>=20
> But I do not like this patch anyway, for several reasons:
>=20
>  - no print "NEKRAL: bla" should remain. Change them to debugging state=
ments
>    or regular (translated) warnings, or whatever. Add a new debugging
>    category if needed. Same for "# print STDERR bla".
>   =20
>  - you are kind of cheating by not die()ing when you said in the commen=
ts
>    that you should... It removes a lot of PBS :)=20
>   =20
>    If the pages are ok afterward, then remove the fixme asking to die. =
If
>    not, die. Undetected errors are worse for the user than unprocessed
>    pages. He can always change the structure of the page to make po4a h=
appy.
>   =20
>  - you say that a proper implementation of \fP needs to keep track of t=
he
>    font used at the end of the last paragraph. Are we in bold after
>    \fB\fI\fB\fP\fP ? (in which case, we need to keep track of the whole
>    stack)=20

Yes.
That's the reason why I replace \fP by the exact font (starting from the =
left).
In the patch a loop is in charge of replacing \fB\fI\fB\fP\fP by:
\fB\fI\fB\fI\fP
and then
\fB\fI\fB\fI\fB
And then I can clean up the "nested fonts"

I Think doing this can avoid keeping track of the whole stack

Another insane example:
.BI t1 t2 t3\fRt4\fIt5 t6\fPt7\fPt8 t9
(such groff requests were found)

>  - this is really complex, I'm concerned about maintaining the beast wh=
en
>    you leave. Ie, you shouldn't have said you have a simpler solution i=
n
>    mind, now I want to see it ;)

1) change all font modifiers (e.g. .B, .RI,...) to the corresponding \f
   I'm thinking of doing this in shiftline (any objection ?), because I
   need to handle these lines in parse, in the .TP, .SH, and maybe other
   macro subroutines.
Then in pre_trans:
2) split the paragraph on \f
3) play with the first letters of each element of this array
   (they can be [1-9],B,I,R,CW,P,s(-|+)[0-9])
   - If the first letter is not recognized, then die (or give a \f to the
     translator)
   - If two consecutive elements start with the same font, merge them

   A special care shall be taken for the first elements (maybe the two fi=
rst,
   because of \fP) and for the last (maybe the last two) elements.
   I will probably do this with a global context that will keep the
   current and previous font (which will be reset by some macros like
   .SH).

I will do some experiments on this.
I think this will be better, cleaner, smaller.

>  - there is one stylistic rule I like in Perl: if you use a given varia=
ble
>    only once, remove it, you don't need it. @tmp_array1 falls in that
>    category. Likewise, I'd prefer to write (! $old eq "") as ($old ne "=
")
>    [please don't get offended by those stylistic remarks, this code is
>     incredibly hairly, I'd like to keep it under control]

I didn't knew ne ;) thanks!

How can I remove a variable (or maybe you mean I could have done it
without using variable with map)?

Using perldoc is still a little bit confusing for me. So please, no
hesitation on stylistic comments.

>  - I'm not sure I understand the "two concecutive \fR are not needed" p=
art
>    What about the following:
>    while ($str =3D~ /^(.*)\\fR    # Get the beginning, until a \fR
>                                 # Now, get the middle=20
>                     ((?:        # [this parenthesis clusters, don't cap=
ture]
> 		      [^\\]      # the middle is either not '\'
> 		       |         #   or
> 		      \\(?!f)    # '\', but not followed by 'f' (look ahead)
> 		     )*)
> 		    =20
> 		    \\fR        # end sentinel
> 		    (.*)$/sx) {# use x modifier to add documentation and space
>       $str =3D "$1\\fR$2$3";
>    }
>=20
>    Ok, I admit, this is not tested at all, and I just discovered the
>    "negative look ahead" black magic in man perlre, I may well be wrong=
. I'm
>    sure you'll correct me if it's the case, since you use this magic
>    elsewhere.=20

Yes. This is equivalent. I've tested it (with some minor changes).
I will use it, but this will be removed once I will have another font
algorithm.

>    We should use the x modifier more often to document the REs.

Right!
I'm not sure I can still read my regexp now!

> All that to say that I'd prefer to keep this one out of the CVS for now=
.
> =20
> >   + arg_next_line
> >     It allows arguments to be provided on the next line for some macr=
os
> >     (.SH, .I, ..., .BR, ...)
> >=20
> >     It works fine, but would require some cleanup (lots of redundant
> >     code).
> Erm. You'll say I'm picky, but I'd prefer you to do this cleanup before
> commit :)
I never said the patches were ready for commit;)

> >   + dot_lines
> >     po4a generated some lines starting with a dot. In those cases, a =
\&
> >     should be added to allow the line to be displayed. (for exemple:
> >     .I ../file
> >     is displayed in groff, but
> >     \fI../fil/\fR won't be displayed
> >     It also fix the same issue for lines starting by a "'"
>=20
> Arg, I'm so bad! I thought I got this one years ago. Thanks for chasing=
 it.
>=20
> Why don't you change it in paragraphs after the wrap? You'd get 'em all=
, and
> the code for that would be in only one location.
>=20
> For example, you may want to merge your change to the=20
>       $str =3D~ s/\n([.'"])/ $1/mg; #'
> just before. What about:
>       $str =3D~ s/^          # at the begin of line
>                  (?:\\f.)? # possibly followed by a font modifier
> 		 ([.'"])   # avoid '.' and '''
> 	       /\\&$1$2    # add \& in front to fix it
> 	       /mgx;

You're right, doing it in post_trans is better.
However, there's currently a minor bug with .TP:
.TP
.BI foo bar
will proposes for translation the ".BR foo bar" string instead of
"B<foo>I<bar>".
This issue doesn't raise any difference in the regression test, but
escaping the leading dot add some pages to WDIFF.

So I only propose a degraded mode:
#    # No . on first char, or nroff will think it's a macro
-    $str =3D~ s/\n([.'"])/ $1/mg; #'
+    unless (defined $self->{type} && $self->{type} =3D~ m/^TP$/) {
+        # This doesn't work when the translated string start
+        # by a macro (which should be avoided).
+        # At this time, .TP lines followed by .B, .BI,... produce
+        # such strings.
+        # This will be fixed latter.
+        #
+        # Only "." are handled now.
+        # "'" will be handled latter (when it will also be handled in th=
e
+        # parse subroutine.
+        $str =3D~ s/^(            # at the beginning of a line
+                    (?:[BI]<)?  # possibly followed by a font modifier
+                    [.])        # avoid "." and "'"
+                 /\\&$1/mgx;    # add \& in front to fix it
+    }
+    $str =3D~ s/\n([.'])/\n\\&$1/mg; # degraded mode for "'", and "." in=
 TP
+                                   # (doesn't handle the first line)

The unless, a part of the comments and the last regexp could be
removed latter. As ' could be added to [.] latter.

This seems to work fine. I will prepare a patch or commit it.

> >   + new_macros
> >     Some new macros:
> >       .R
> >       .EX and .EE
> >       .so and .mso
> >       .cs
> >       minimal support (when no argument is given) for:
> >         .ce
> >         .ul
> >         .cu
>=20
> This one is not in the tarbal. :-/

It should be attached this time.
Do you think the .so/.mso part is OK ?

> >   + escape
> >     It tries to deal with the \c escape.
> >     It still need some work.
>=20
> Kill the die I did put if it's not needed anymore.
>=20
> You could override the pushline function, if you understand Perl object=
s. I
> don't.=20

I've done some tests. Overriding pushline is OK. I will try to handle \c
there.

> What is the effect of \c, again?

info groff --index-search "Line Control"

> >   + others
> >     some other minor points that I could isolate from my working dire=
ctory
>=20
> What is the change for cdrdao ? It deals with non escaped leading space=
s, am
> I right? Does it still deal with not leading spaces? I don't think so. =
you
> may want to kill those ^=20

cdrdao uses:
.IP CATALOG\ "ddddddddddddd"
(Here, the quote have to be displayed)
When this passes through pushmacro, additional quotes shouldn't be added,
because
.IP "CATALOG\ "ddddddddddddd""
(this is understood by groff as an IP request with 2 arguments: 'CATALOG\=
 '
and 'ddddddddddddd""')

The regexp proposed in the patch would be shorter with a negative
look-behind regexp.

> Ok for the SH macro.

Do you think it could be done in the $macro{'SH'} sub ?

> For the space at the begining of paragraph, shouldn't it be handled alo=
ng
> with ' or . at the beginning of line?
>=20
> >   + split_args
> >     This fix an issue for the limits.conf man page.
> >     It was also reported in #268904
> >     It adds one string for the translation.
>=20
> Ok, I tried to understand your bug report three times, in vain. It's no=
t now
> at 1 am that I will understand the patch. I belive you on this one.

It is probably useful for only one man page. It may not=20

> Did you use quilt, at least? ;)
No.
After two mails bounced by Alioth I had a look at the diff and thought it
was time to split it in different patches.

> PS: I forgot whether you're on the list, thus the cc.
I'm on the list, but thanks for the cc (it permitted me to receive it
some days before).

--=20
Nekral

--uAKRQypu60I7Lcqm
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="Man.pm.new_macros.patch"

--- Man.pm.orig	2004-09-21 00:38:09.000000000 +0200
+++ Man.pm	2004-09-22 00:11:35.000000000 +0200
@@ -579,7 +579,7 @@ sub parse{
 	    }
 	    # Done with spliting the args. Do the job.
 
-	    if ($macro eq 'B' || $macro eq 'I') {
+	    if ($macro eq 'B' || $macro eq 'I' || $macro eq 'R') {
 		# pass macro name
 		shift @args;
 		my $arg=join(" ",@args);
@@ -668,8 +668,9 @@ sub parse{
 	    # Special case:
 	    #  .nf => stop wrapped mode
 	    #  .fi => wrap again
-	    if ($macro eq 'nf' || $macro eq 'fi') {
-		if ($macro eq 'fi') {
+	    if ($macro eq 'nf' || $macro eq 'fi' ||
+                $macro eq 'EX' || $macro eq 'EE') {
+		if ($macro eq 'fi' || $macro eq 'EE') {
 		    $wrapped_mode='YES';
 		} else {
 		    $wrapped_mode='MACRONO';
@@ -986,9 +987,13 @@ $macro{'ps'}=\&untranslated;
 # .so filename Include source file.
 # .mso groff variant of .so (other search path)
 $macro{'so'}= $macro{'mso'} = sub {
-    die "po4a::man: ".sprintf(
-      dgettext("po4a","This page includes another file with '%s'. This is not supported yet, but will soon."),
+    print STDERR "po4a::man: ".sprintf(
+      dgettext("po4a","This page includes another file with '%s'. Don't forget to translate this file, and to make it available at the right place."),
 	$_[1])."\n",;
+    my ($self,$macroname,$macroarg)=(shift,shift,join(" ",@_));
+    
+    $self->pushmacro($macroname,
+		     $self->t($macroarg));
 };
 # .sp     Skip one line vertically.
 # .sp N   Space  vertical distance N
@@ -1065,13 +1070,53 @@ $macro{'pc'}=\&translate_joined;
 # .rs    Enable them again
 $macro{'ns'}=$macro{'rs'}=\&untranslated;
 
-# All of these are not handled yet because the number of line may change
-# during the translation
+# .cs font [width [em-size]]
+# Switch to and from "constant glyph space mode".
+$macro{'cs'}=\&untranslated;
+
+# .ss word_space_size [sentence_space_size]
+# Change the minimum size of a space between filled words.
+$macro{'ss'}=\&untranslated;
 
 # .ce     Center one line horizontaly
 # .ce N   Center N lines
 # .ul N   Underline N lines (but not the spaces)
 # .cu N   Underline N lines (even the spaces)
+$macro{'ce'}=$macro{'ul'}=$macro{'cu'}=sub {
+    my $self=shift;
+    if (defined $_[1]) {
+        if ($_[1] <= 0) {
+            # disable centering, underlining, ...
+            $self->pushmacro($_[0]);
+        } else {
+# All of these are not handled yet because the number of line may change
+# during the translation
+            die sprintf("po4a::man: ".
+               dgettext("po4a","This page uses the '%s' request with the ".
+                               "number of lines in argument. This is not ".
+                               "supported yet.\n"),$_[0])."\n";
+        }
+    } else {
+	$self->pushmacro($_[0]);
+    }
+};
+
+# .ec [c]
+# Set the escape character to C.  With no argument the default
+# escape character `\' is restored.  It can be also used to
+# re-enable the escape mechanism after an `eo' request.
+$macro{'ec'}=sub {
+    my $self=shift;
+    if (defined $_[1]) {
+        die sprintf("po4a::man: ".
+           dgettext("po4a","This page uses the '%s' request.  This request ".
+                           "is only supported when no argument is ".
+                           "provided.\n"),$_[0])."\n";
+    } else {
+        $self->pushmacro($_[0]);
+    }
+};
+
 
 ###
 ### BSD compatibility macros: .AT and .UC

--uAKRQypu60I7Lcqm--