[Po4a-devel]Some patches for the Man module

Martin Quinson mquinson@ens-lyon.fr
Fri, 24 Sep 2004 01:28:23 +0200


--T4sUOijqQbZv57TR
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Sorry for the delay.

On Wed, Sep 22, 2004 at 11:19:30PM +0200, Nicolas Fran=E7ois wrote:
> Hello,
>=20
> I should have released these patches a few time ago, but Alioth had troub=
les.

Yeah, you really should. Now, you'll have to create an account on alioth so
that we can give you the cvs write access, so that you can commit them
yourself...=20

> Sorry if this mail is a little bit long.

You must be kidding... Thanks for your work on po4a. And my answer is even
longer :)

> Some are still not "ready for production", and are provided to inform you
> I'm working on those subjects (and also to grab some ideas).

ok, here we go.

> formats regression tests statistics:
> $ ./stats.sh orig work
>         IGN    OK   OK2  WOK1  WOK2  WOK3   PBS WDIFF
>   IGN  1698     0     0     0     0     0     0     0
>    OK     0   125     0     0     0     0     0     0
>   OK2     0     0  1564     4     2     0     0     0
>  WOK1     0     8     0    89     0     1     0     0
>  WOK2     0     0     0     0   208     0     0     1
>  WOK3     0     2    11     2     4   301     3     0
>   PBS     0    35   150    15    48    43   585    14
> WDIFF     0     1     9     4     5     8     0    39
> total:  4979 | 4979

Please add "was \ is" (or the contrary ;) in the upper left cell of the tab
to help my understanding.

> The different categories are:
>   IGN   man pages po4a refused to operate on (e.g. wad generated by
>         Pod::Man)
>   OK    diff -uBb didn't see any difference
>         This can contain very rare misformatting
>   OK2   diff -uBb didn't see any difference after converting hyphens to
>         minus sign, `` to ", and '' to " in both man pages
>         This contains a little bit more misformatting, for example an man
>         page referring to an empty argument ('') should not display only =
".
>   WOK1  wdiff doesn't see any difference after the same modifications
>   WOK2  This tries to detect changes in the hyphenation of words (but has
>         more false negative)
>   WOK3  This removes minus signs, and thus detects more changes in
>         hyphenation
>   PBS   po4a preferred to stop processing the man page (non supported
>         macro, ...)
>   WDIFF These are probably bugs in po4a or in the man page (I started
>         reporting some of them in the BTS, which is another way of
>         improving po4a statistics)
>       =20
> In the table above, it is usually an improvement to have big numbers on
> the bottom left corner (with the exception of the IGN column).

Please put this as comment in the script (at least, if not by default under
the table).

Where is the WOK category (diff sees a difference, but not wdiff even before
edition) ?

> Here are the patches for the Man module:
>=20
>   + comments
>     It recognize some (probably incorrect, but usual) comment lines.
>     Here are the results of the regression tests for this patch:

Question: the info groff part you cite in comment seems to imply that=20
=2EB toto \" a comment
is a valid construct, but this is not handled by the code. You should at
least add a comment stating so, shouldn't you?=20

That being said, if this feature is not used by the existing pages, don't
bother implementing (=3D> let's commit).

>   + nested_fonts
>     It deals with the nested font issue.
>     I have an idea on how to simplify it a lot, but I think it could be
>     applied, because it is doing a good job.
>     The only remaining issue is with "un-terminated" fonts, as in:
>       Hello, my name is \fINicolas \fBFRAN=C7OIS
>     IMHO, in groff, there is no nested font (with some exceptions, like
>     SB, and some italic and bold faces, or by using exotic tmac).
>     \fIfoo\fBbar\fR is equivalent to \fIfoo\fR\fBbar\fR (with the
>     exception of the \fP).

Mmm. That's a very tought problem. As you saw in the code, I already tried
to tackle it, in vain so far. I must admit that the regression result you
show are rather impressive.=20

But I do not like this patch anyway, for several reasons:

 - no print "NEKRAL: bla" should remain. Change them to debugging statements
   or regular (translated) warnings, or whatever. Add a new debugging
   category if needed. Same for "# print STDERR bla".
  =20
 - you are kind of cheating by not die()ing when you said in the comments
   that you should... It removes a lot of PBS :)=20
  =20
   If the pages are ok afterward, then remove the fixme asking to die. If
   not, die. Undetected errors are worse for the user than unprocessed
   pages. He can always change the structure of the page to make po4a happy.
  =20
 - you say that a proper implementation of \fP needs to keep track of the
   font used at the end of the last paragraph. Are we in bold after
   \fB\fI\fB\fP\fP ? (in which case, we need to keep track of the whole
   stack)=20

 - this is really complex, I'm concerned about maintaining the beast when
   you leave. Ie, you shouldn't have said you have a simpler solution in
   mind, now I want to see it ;)

 - there is one stylistic rule I like in Perl: if you use a given variable
   only once, remove it, you don't need it. @tmp_array1 falls in that
   category. Likewise, I'd prefer to write (! $old eq "") as ($old ne "")
   [please don't get offended by those stylistic remarks, this code is
    incredibly hairly, I'd like to keep it under control]

 - I'm not sure I understand the "two concecutive \fR are not needed" part
   What about the following:
   while ($str =3D~ /^(.*)\\fR    # Get the beginning, until a \fR
                                # Now, get the middle=20
                    ((?:        # [this parenthesis clusters, don't capture]
		      [^\\]      # the middle is either not '\'
		       |         #   or
		      \\(?!f)    # '\', but not followed by 'f' (look ahead)
		     )*)
		    =20
		    \\fR        # end sentinel
		    (.*)$/sx) {# use x modifier to add documentation and space
      $str =3D "$1\\fR$2$3";
   }

   Ok, I admit, this is not tested at all, and I just discovered the
   "negative look ahead" black magic in man perlre, I may well be wrong. I'm
   sure you'll correct me if it's the case, since you use this magic
   elsewhere.=20
  =20
   We should use the x modifier more often to document the REs.


All that to say that I'd prefer to keep this one out of the CVS for now.
=20
>   + arg_next_line
>     It allows arguments to be provided on the next line for some macros
>     (.SH, .I, ..., .BR, ...)
>=20
>     It works fine, but would require some cleanup (lots of redundant
>     code).

Erm. You'll say I'm picky, but I'd prefer you to do this cleanup before
commit :)

>     It can be applied cleanly on CVS, but require the 'nested_fonts' patch
>     to operate cleanly.

Well, you really should send us more mails to avoid having to deal with
interdepending patches. Those beast are sooo nasty to deal with, I'm really
sorry for you.

>     (here most of the new 'WDIFF' man pages are bug in the man page, and a
>     bug was filed in the BTS)

Hey, cool.
=20
>   + dot_lines
>     po4a generated some lines starting with a dot. In those cases, a \&
>     should be added to allow the line to be displayed. (for exemple:
>     .I ../file
>     is displayed in groff, but
>     \fI../fil/\fR won't be displayed
>     It also fix the same issue for lines starting by a "'"

Arg, I'm so bad! I thought I got this one years ago. Thanks for chasing it.

Why don't you change it in paragraphs after the wrap? You'd get 'em all, and
the code for that would be in only one location.

For example, you may want to merge your change to the=20
      $str =3D~ s/\n([.'"])/ $1/mg; #'
just before. What about:
      $str =3D~ s/^          # at the begin of line
                 (?:\\f.)? # possibly followed by a font modifier
		 ([.'"])   # avoid '.' and '''
	       /\\&$1$2    # add \& in front to fix it
	       /mgx;

Same story, not tested at all.

>   + hyphen
>     I had a obligation to fix this because I said Martin that replacing
>     hyphens by minus signs were always allowed.
>     In fact, it should not be modified in
>       - .so/.mso arguments
>       - after a \s (font size modifier, e.g. \s-2)
>     I also added a comment on why I *hate* hyphens.

I like this one ;)

Change all items to an itemize in the comment, and, if you don't mind,
remove your name. That's just a suggestion. No need to overload the code
with who hate hyphen the most. I do to. But, by the way, add your name to
the authors of this module, and of po4a.

Please document the RE. I'm sure I'll forget that ?<! means negative look
behind until I'll need to read it ;)

s/they most of the time break/they break most of the time/

>   + new_macros
>     Some new macros:
>       .R
>       .EX and .EE
>       .so and .mso
>       .cs
>       minimal support (when no argument is given) for:
>         .ce
>         .ul
>         .cu

This one is not in the tarbal. :-/

>   + escape
>     It tries to deal with the \c escape.
>     It still need some work.

Kill the die I did put if it's not needed anymore.

You could override the pushline function, if you understand Perl objects. I
don't.=20

What is the effect of \c, again?

>   + others
>     some other minor points that I could isolate from my working directory

What is the change for cdrdao ? It deals with non escaped leading spaces, am
I right? Does it still deal with not leading spaces? I don't think so. you
may want to kill those ^=20

Ok for the SH macro.

For the space at the begining of paragraph, shouldn't it be handled along
with ' or . at the beginning of line?

>   + split_args
>     This fix an issue for the limits.conf man page.
>     It was also reported in #268904
>     It adds one string for the translation.

Ok, I tried to understand your bug report three times, in vain. It's not now
at 1 am that I will understand the patch. I belive you on this one.

>   + all
>     all the above patch, and more.
>     It also contains some comments that should be removed.
>     The results are presented in the first table.

Nope, forget about this one ;)

> Comments (and commits;) are welcome.

I will commit some of them (I'm now offline), but I'd really prefer you to
get the write access. It would help you concentrating on the bugs instead of
playing with interdependent patches.=20

Did you use quilt, at least? ;)

> Thanks for those who read this mail to the end,

Many many thanks to you.

Mt.

PS: I forgot whether you're on the list, thus the cc.

--T4sUOijqQbZv57TR
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBU1wXSJAMsfOxudIRAnGkAJ0UdGAy8OHFXYBT0QDXeFYFBvHptACfXBAa
sj6CyN7upM2vZ6rppsVKOa8=
=QLdQ
-----END PGP SIGNATURE-----

--T4sUOijqQbZv57TR--