[Po4a-devel]Call for a (La)TeX module

Martin Quinson martin.quinson@imag.fr
Fri, 26 Nov 2004 06:29:24 +0100


--0lnxQi9hkpPO77W3
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello,

I think there is a cruel need for a TeX module in po4a. This is the last
major documentation format missing to our panel, and people keep asking
about it. Nicolas (CCed) mailed Denis and me privately about that a week
ago, Nekral mailed me yesterday, and so on.

This format family may allow us to deal with texinfo documentation (all GNU
documentation), with book translators (like Nicolas) and maybe even with the
Python documentation (I trust Nekral on this one). And, more important than
all the rest, I'm tired of translating my presentations and articles
manually :)

The problem that kept me away from doing this until now is that as groff,
tex is a programming language in which you can define new macros. It's even
worse since authors do actually define macros (in groff, a few steal some
classical macros here and there. most people don't bother).

As with the other formats, po4a do not intend to become a full featured
format interpreter (=E0 la HeVeA). It just intend to parse it and split the
input in msgids. I have some ideas about how to do so, but it's really
impossible for me to start a new module implementation. So, I'll explain my
plans here, and hope that someone will step in...


For [the rare] documents not defining any new macros and sticking to
unadulterated LaTeX, it should be rather easy to build a first prototype
simply splitting on limits between TeX's vertical and horizontal modes.=20

 - As usual (hello Yves), you need to distinguish between inline tags (ups,
   macros), which you ignore (such as textit or footnotesize or $bla$), and
   formating ones, for which you translate the argument (such as \section,
   \subsubsection or $$bla$$).=20

 - Translate separately the content of all environment.

 - Some macros need a more complex handling, I'm sure.=20
=20
 - Translate separately each item (of a itemize and associate).

 - Naturally translate separately each paragraph separated by empty lines.

 - Ignore stuff like \medskip, since they are formating only.=20
   Hint: it's used in vertical mode. (if there is some \newpage, I guess
   you're dead)

And so on and so far. I belive in this approach for simple documents. There
is two main jobs here :

 - write a proper parser, which can detect macros, separate their arguments,
   etc. This may be the more difficult part. tex is full of \ and { all
   around the place. You'll have to protect them, and to come up with a
   usable way to determine the } corresponding to a given { (so that the
   inbetween can be treated as a macro argument).=20
  =20
   Classical constructions (item) should be dealed with in there. All the
   rest should be passed to macro handler just as in the man module.
  =20
 - read a latex definition and write the right handlers for the right macro.
   There will be a bunch of dupplicated work if you don't do as in the man
   module (or come up with a better idea, of course).

Once this is done, you'll be able to deal with documents with no
\newcommand. For new definitiones, I guess that the only viable idea is to
go for specifically formated comments in the document (lines begining with
'%po4a:' ?) to explain which category each macro belongs to. You may even
want to allow the interpretation of perl code embeeded into the document, if
you're not concerned about security *at all*.


If you want to give it a try, you're welcome, anywho you are. Just check the
documentation to see how po4a works.=20
man Locale::Po4a::TransTractor -> root of the project.
man Locale::Po4a::Sgml -> some ideas about the categories; file inclusion.
man Locale::Po4a::Man -> some ideas about the macro handler mecanism.

Then, mail us. Then start coding.


That's all I can think about at 6.30 am in a night train bringing me to yet
another job interview...

Please comment/forward/react.
Mt.

--0lnxQi9hkpPO77W3
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBpr80IiC/MeFF8zQRArU0AJ4/itwsd1vq10s9zPF7lkDz1q1b6QCeNPEd
o7ep0p4fW8GFUfH56qJl6ps=
=R1Na
-----END PGP SIGNATURE-----

--0lnxQi9hkpPO77W3--