[Po4a-devel]Design of the (La)TeX module

Nicolas François nicolas.francois@centraliens.net
Sat, 11 Dec 2004 12:15:29 +0100


--opJtzjQTFsWo+cga
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Dec 10, 2004 at 11:34:32PM +0100, Denis Barbier wrote:
> On Mon, Dec 06, 2004 at 09:47:09PM +0100, Nicolas Fran=E7ois wrote:
> > parse:
> >   * The parse function will only separate paragraphs. The separator i=
s an
> >     empty line or a line beginning by a comment.
> >     What I'm calling here a "paragraph" is not a paragraph in the out=
put
> >     document, but a bloc of code separated by one of these separators=
.
>=20
> No, comments should simply be ignored when splitting into paragraphs.
> It is not uncommon to write a comment within a paragraph.

I'm not sure to understand:

Here is what I'm doing when I encounter the following block:

foo
bar %baz
qux
%quux
corge
grault

I first remove the %baz comment, and store it a table. This comment will
disappear from the final document, but I will show it in the PO (possibly
at a wrong place) because it may help the translation.
Then, for the %quux comment, I consider that this comment separate two
paragraphs, which will be translated separately.

So, I'm ignoring the first comment when splitting into paragraphs, but no=
t
the second one. Is this OK, or should I also ignore the second kind?


> > translate_buffer: return the translation of a buffer (typically a
> > paragraph or a subset of a paragraph)
>=20
> See above, IMO it should be a paragraph.

If I encounter the following paragraph:

\chapter{Lexical analysis\label{lexical}}

This buffer will be given to translate_buffer, which will separate this i=
n
one command with one argument, and will call the chapter subroutine. This
subroutine will then call back translate_buffer with the content of this
argument: "Lexical analysis\label{lexical}".
Then translate_buffer separate this in one buffer (Lexical analysis), and
one (trailing) command with one argument (\label{lexical}). It will
translate the buffer, and call the label subroutine.

The same think can happen if a textual paragraph ends by a footnote. The
footnote can (and IMHO should) be translated separately.

That's why I wrote "a paragraph or a subset of a paragraph".

> >   1) call get_leading_command, to handle a leading command
> >      If the paragraph begins by a command, call this command's subrou=
tine
> >      with the paragraph in argument and append this translation to th=
e
> >      translated buffer.
> >      Loop until there is no more leading command.
> >
> >   2) call get_trailing_command, to handle trailing command (loop)
> >      while there is some trailing commands, call these commands, and =
build
> >      a translated buffer to push at the end of the current paragraph.
> >   3) append the translation of the remaining paragraph (if any)
> >   4) append the translation of the trailing commands
>=20
> Should work mostly fine with Nicolas' book, but what are these trailing
> commands?

Here is a Python paragraph:
A Python program is read by a \emph{parser}.  Input to the parser is a
stream of \emph{tokens}, generated by the \emph{lexical analyzer}.  This
chapter describes how the lexical analyzer breaks a file into tokens.
\index{lexical analysis}
\index{parser}
\index{token}

The indexes here are trailing commands. They are translated separately
from the paragraph (in this case, they are maybe untranslated)



> >   * it should be possible to keep the separator between the commands
> >     (could be none, a space or a newline).
> >=20
> > One question: Is this separator important? For example, can I re-wrap=
:
> > \inputprotcode
> > \makeindex
> > \debing{document}
> > \myeqnspacing
> >    into:
> > \inputprotcode \makeindex \begin{document} \myeqnspacing
> > or even
> > \inputprotcode\makeindex\begin{document}\myeqnspacing
>=20
> Normally spaces, tabs and newlines are equivalent, but there are some
> circumstances where they are not, as when writing source codes.
> It is likely that spaces do not matter in this book, so I would say
> to not bother if this is much easier for you.

That's the answer I was waiting for;)
(It can certainly be corrected later)



> But those macros aside, is a LaTeX module very different from XML
> or SGML?  It looks similar to me, there is a stack of environments,
> and the parser could be told what to do with these environements by
> a command like set_tags_kind (from Sgml.pm)

Yes, a parser is a parser. Maybe we could share the same interface. I'm
not sure we can do more, because for example these languages don't split
paragraphs the same way (in XML, empty lines have no special meaning, and
we have to look inside the buffer to find some <BR/>).

I've just started to have a look at Sgml. If anybody wants me to change
the name of the TeX subroutines in order to have the same interface, it's
OK.



I'm attaching my current implementation. I will clean it and commit it.
It's mostly laking customization of new commands and file inclusion.
I was able to parse bk2/bk2.tex with it.
There are currently two kind ofs commands: untranslated and
translate_joined. It is possible to customize new commands on command
line (or at the end of the file).

The commands I've added at the end of the file won't be committed (I've
arbitrary set all encountered commands to one of these categories).

It is not ready AT ALL for texinfo.

Regards,
--=20
Nekral

--opJtzjQTFsWo+cga
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="TeX.pm.bz2"
Content-Transfer-Encoding: base64

QlpoOTFBWSZTWZv6BQcACjV/hfbWZqD///////ff7/////8AgEDwgCBgG5zRPuXMq7XYwGUg
51XO2aEnWLWrGu3WdPvOPSnr3u8pXtotwZdl2N7e3O9sbWe2dEkJ64XOzXjb0y7Z7s66Hru9
4JFBAUw0TBpNU8ymg0Kf6ibU2UT1PKPUaDTQ9T2lHqeo9TQeoPUAPTKCU0ICCQaKeiZDU21N
T1NNNinqZpB6geoaA0GgANDR6mgAHAADQANAGgAANGIABoAAAAAGgAk0oiCGmoynpNGp6aJ6
j0mTMSDQMQ0PUGhoDQA0PSaNqAAikKmwqeImhkyD1PU0MmIAGgAAAGjQAAAAAJEQSYmmiZNE
yZNNNSZPTVPTZSep6j9U8o0Bo09Ieo09RoB6hoAAdIH3egCEh8R6/5fBMCP57CxUWKn7moiL
E8V+DG3EvotOirVwev/LTHs+Jp4nbt/ndnw3LMGHHShFBHIiUqKnlc739qRzJreD2INlkEUg
QNtaTYnxSe/cjOZAyRwI6djHg26QxBhpamrUg+1oUxN6pEYDMcc6uk4cksyulpsGgr3LKeNC
wQ78A3FwiKoopqbGhA1VIhCI+HdMrQjoobH83JV6pyA8wj99JklxNRJSaRIIFJyGDBMpkWED
tUdxsgYoLFixPTUguQXH9DRQR+V1CwG8/FHS7n5YafyYsW7RnrX47g5qWlibHWP2RCbIpSvy
Sju9h1skZYbOntU/NZvbG3heQqORGBo5Wf0lBBQ02JDY/SxSNHJlDjrvMrvK++sZtXtB4awc
caq1CUasBTokkDrCCMhCQRIEJJmSFQWKoHpHMUkAPaEJBZABjAkCaPnaL8s58Ff0cIg9KzMx
oh3NkP5brCo9owdW5qwGx1g5ymQMwuyw0MNxmahPs9/nIpKbS64cfJ8Jg0EZhk0xKCHkDvnP
nOjWRlSjGohnJRlNJuaoKBrvsKk8zRXq7ZStMVyKPAyUEsYxN+4nKcxkNIEyinFlI6Fistdo
T1MDoj7d9+5R6MTpBMtIiPhp+fAFQ8W5/D0g+D2e4ZDD2G6uZTq+XzDOISjiS08Vt1jvzbig
oKDGQMbeOBHCPlGLNCC1SHc2luVGWshMx8WYA9S4rOVmF/X8lc+1Kws/d6e9Bw0NUBA4YGIG
7ieOnw9R6i+MKYNEGMjGEEU93/GrMX9e4mO10XTfVLoUpzyzSO+OVlvERuooqmwhf/FQRMML
qh5df2fkVgTuruz6Dokr1r8F9s+ZMLLEjDHJj7GwrJ3W85TJWaPVHjbYed2tYHVr/jmW72h3
PVaYJ7N7PzKb0bW83e1fUjixx4aQGmVSo+InU72prbJkHOqEkllRpRJZpopaDxeat59Hewpe
H1xioVPkyFUEWfGhn6JtbEkV35NecE3k1smuCELXg5puOxV3lbFnrCdTY4ZCm9H/FGI4aw/n
nxk4CK7/O2Dwg3INUvq7dB7gdUKXJ0raq8hRHXbebvcd5djrf68OuRyh2FI3ro49137NqsTy
mshtpEDeNpHr0p1R7lRNBN89qM6WOq+3ccc3NNjt39GBkB1BYraj9OmFz9eNmVZ7pnvqia/O
IZQwIw7244+jUywxU8Ef6yemYM1jGD3ue0l9VbVs0G+nhmFY6NCoTCtFu5Qp2V1VgtkM4u21
KU1I5mPWTnjo4XtAeda5TpWk1b0rZdD6K7L07CZqfMEZ41OdA417EgnDo1wkVM80jmGdqh1x
HSPkyr5mUYVIQXHjow93WhiA4Nbgck2e5JwTHXsE62HiAeYvuGVQ1Ccro2OmxfSNuSjk0mty
ghD2mmgZQjjlTJRy6d6bLNOKtqtUCUeZVtrGJ6eC4gsGiwLinxvd+jXuPnQfFNH6s+ztoqXN
8EcoquPIOPIdSDRLj6R6F+p85zNiKr1Yy/Qr35x7BWdR+YDpExLILFgIxYIgiRXTcU4A/VwZ
47NOHi6jqERURjPNZ+0ejSerDVIsih99k4f0Xlq4mT5PHoansZRQHqrsjPO+3YQVS0nMRRHn
ZAlKG7f0+5s0+lmzmD9PYpPnxSMPeT2SnmBiVkozVCfEB40gYmkrTqnHzD9B+kkqJKRIgRBq
bUy0xrYWOY0o0wOAtCuTOCXK4zqalZLRZuJsWLFrVuCVwgSgE11QP4DqT7ExcLpG4GJQeEH4
QtaBFsah1o0+yT7Psg4jOsbdhgcEZi6MHEOJgoRUYqkCsCEJJyFHsmIZMFCxgjAzhUobTmIE
QyEgZm0YzKQKNppDYwwRm00mxTfAVpOliuBnsgx2xkM0O3CdIUNrQasg699A1RUOCs/By5iO
iBvzIszVZgVrW6SZXZb3rotLSwGEJoqi6lUMVBVINSC4MkI2Y0KPLFEWbH8ZIQmAcos8fwb1
11p+OwsqR9wPpIPFJr7unshRdQigclzI8mYqbNI+NgHgMhibQk2SwLCZLDrabAxPi8sw209w
wlKWNoPliny2F5a0HHvD/Sd1dCVH9y+IovA9qcHjVUipXmFa11fGecOK1YQeazjC8iL7QxPc
ENtXZHaJHEADY/z46b4yLYMsP3KDIJXQOsFlOOCiI9UonymtCgz0PduXCfIvwkCGj73j5/XA
ZJZBw+/gshot3FQVMkIk0Lc0B4dxZ4XuZ+UyYnEiKCYzj9+fHsW7uIUMZlMyNP7rlu31lShV
tkjkP1p7RCF4NGDxfj1CHSR3du7yuc0uyl9oOogc26QvWTo1LHKK6dJHPN4zb2/YjS72RXK7
8YFDHCBjvzsYer4knPTbBM6XSPP71ay4jKUM7EMbEl16PRaWOxHTT5P49uwMvmSOkZ4e2B1I
lU5B2hgac1FF8fGiwUruWJmJG08PPsoJFq5FHJEoaXBf1PWdmdVcuxn41CugWLZmHqt02wl1
4EJHQMHyGiEDKJdTcs1xnG2EYBjBU4mAkQIhhxAW6xDkg0C+fKA5/V9YZ3SaQdIfu64bIty6
q3pL3dVY3QW8Hh46c+tKSFAx6HYt4d/ppVfZ38LmRRnaI+pjTg18uQgeve3adlJeHn/bbqSV
lOFQaqPfD63737n738H8HaM8ygeQtI9D7a/PSU9KTa5cfuwXYsnU8uXNHD3ufPtOvmEZP55q
Xa63a228jxZhLj9ByHRK1Th3ErLkPG/AV/HIpGMYxoqwiCCDZAtl6SHfFT1iCCCc9nj69/z9
Xw3HAUL0sP+MXOOJlAvEdeEPX6UggaiC3rus+6/EwNyZnHj56Z6U4vhN+8pMbE3CCmjW3cop
JYsU0ibd2WGAXuEFHFe472S2ikTJLNqE4iYjKxOaSKkhas+xUEqvDdMk3uO/lO0SZnfnQqWD
DDEzf9FI1/M7UigmMi6YQtSZ89fy9h/DuxT45wIEpA91wWlMUCxBZFjGSApgYGBJykPomOOD
fMS82vjfH8FT8ms2GfwLQ0qxgBwnnCADR9Mpakhe3TZmDxhqDvHkDmRQUsELwVrwicWe4ZOS
ZzsTy3RGKqxMXVUxoymC5Sfv/wlPePxZx5rfJ4cJ5CJBhn+tAkPG4ci8KfiubO7Sbkeyk3FU
jSZ1Y10NVjNOsgPWozeeEcsZq8sx/preU9oEIZpjf3y5qMCAzye/ylvi5ioM2Qjj55ZuazEd
T7f56pJ7FfFwJOpjK9FxwldxVaxLEC4XflgstmgqyfaoCof/fi1x/e1sq+2JXSwCwshioJGh
in1qNlDetQhETL83GrFM+Bp48CWzUYiRM22pzyFBYYQWRiLo5TgwRpKCHav8oEhdItZ8p1O4
mvmEHhx262SLYuFBGQZc+eBfq22SJJIzVW2QWedMkiD/WxE9M7Y1wSJyadk5asclQqZxhmkP
O1AINIfMiKIkldy16bLafdlhnv1YHgttd+oZ7AOAlzL7zQtmm/ig6kv2DyCg1SB8/Y4g5HdF
vpKa2+9EaUDnTDxRgsp6d8un6vYtZ+Aw4Cn1mTys8WKH3zBgbgmYpZVtJmHAwWw6RMQcRtMp
t/Ej67SNYMDrwSYPbD3zysQNIVlRXX7xthaL/eUQ2ho+saRafm22/U1hUmjAIcZOEJgFDuwl
uG+zRxKemsaqCNg26jWk5SKOOlvr1L4ftYKtG8GB+2O6tVsHj9v4PsmRsMyZMLfN+QMtXJBB
PVSiiQIjJAnVlhpyOAJDnkAf2u4MC2UAHORX1fDjRdKAxMcYPl7RGz6dZPSmmMYMSMBBtEBT
AlE5N5CM92StcY9OInO893TM1lkRDo5SZNUDckJ5glFApFEG2dhaiO/bENNEhijPYiatmKA7
Pby+oJ1jAhfa0ivsj+XIRQj4YjIKx2U3UDcywL3R12DAWQoW2FJSSpUcwkwjtjNoNAw3U590
oKgu5lsIYR7zLhkbG2mGRhcNNivKUKRMU8vLEKsUOC9H0DrorsVUYKeAYwoSlunQRh20nXMD
BCNKEE79C4eSoKqMZCGFKCSU03o8xIHqpBqcC4I01001IcKhxoWGSbu0fKPWmzznbEQYjEFF
rEGUYloNKFsiMBNpjC4haiIjAxaIJERCAguElMQShGEgwNui4aIFPowusDtFo3yL0NRlqqA8
fmwk/z/M3rPaQyGW0gxFmXGc2HgTPOObbQRQJGWtEA61/W6xIgH7sQ/bRw8F3BkE0f4ybF7i
YvUvQ75auauwV6406zn0sFFohes5ti+0tVCRWLWmobA9dHtqQVl0cipW+JAmJtLSbuNoTPyJ
b/ytQL4uUf0mDd4OXevA5Q9Ocq9NwjgXzWBFd768MVBorttQH5YVnJs7VgshBzl0DzA8M4jM
5vVNQNhQgp3c/l97iPEVUGBcPpA2pZDQUSXAViAiEQk2zsnXghL9kwuTwtl/9NglZC4HPnhX
D9l7NMnCDCalOzErInEJJCSYTNoqldooKya4MiJ8xlLdPU49vXcIDIOqB2g3kzW3FGW++Gie
IkivDfgC02gX1fKapXLhMi6vZ1bKFvxrry5oPG4b1qHgg0vzlIKZg1ylCLNAoGDk4+cHGCKx
BUgs2gekIxzJb6cgMdugmLwYuIWhpIzsMjTLwMpAcmwkiDmEQAmx+ZcXct3MDDjfbZUlY86K
UNa4jfPnE19WYRazzjhU4eQX2HQbA1zWQG6SqkGgJyqoWDwpKTHRhB1IeZ3471t6cAZogSyS
oKIz2wQSlCLUihnpdYM1kBiEOxpyyBAt1oIFJ0HLhyIrUzsdZ4IXP0M9Z1Q5O6YDhxEkZMDc
F6yKAM11iYI3jGw2e4CQQE1KAGENSYdNdMjhwcd9val7YNWRh6W52unQcVvCyqFCqsgSQhYg
5WBIS1yiJkSxcYsCw7aZBMycoMjNJBh26Sd5gKQUCIrD1oaZ0psW+gGoGymIaBiioYWSBQhf
UUEVBtpqaKxeEjgWnIrj7+ZGjEkBWwv5FoOkOU3fOtCslXYVIK6k1JLk/14W73D5xcmTuyTq
VdLNJrkpjFoo3HzRnVwhIMYtpkcWg29wbrITY0eALsR0ST654j20bDsgCGoPFeaC2pTxw1pg
SG4Q2CbFmrobYNST+klv4bPNabN5xResmQuPo+PClMGNttKcMmPFiV0ndxztSRPvwERLEYmk
NiSwaO4IC4zUwsUjBBYxjLOtumy7qHlayTrs59Ycm3vClbqFwodYWjsxBXRlw0zwRxayvuoH
qJ5ggOEG9s/FWPOytSjb3ed1Fm2m3qyERyrx+gsGzusV3iz9NtQQZMQoGF2H3d2lErb4GMII
EqslCQ6S2YLnMxrwvjm/chrcudTYtcZmKNts1/ADzg8V6iSJNDC/ACrjWiskhjG3dboVMqZY
HJaXeiFQ3PkOtMNW7+agrKQjPIRhPOTb0s29RdoQFUNISIpnQIOHppRXEbbtQYVgGBbeRoxA
cFKjJVGxgOeBrAhDFwLyOppAGR4aej8jfl3PyTUEUzRSjoCGUCEoYlDClswzeOesIUzRDBh9
AxSaU61WqIKlIcyoQVR7Ym/4EEMA5hBeqIDoBqphhTmN27TKCYs1ZBYSwZBYoBWEFgWlgmGJ
NTORGBIa5IBlm5EpqiFT3KaaOqEqtCLpMqgj8innU2SipivqmCo4gQD9OK9AzM9cIW19ormC
yBMKpZ4hQ5Jk0ySAf8cEmkuQlsNAUDEhgMZtSCMcgkrAKFlZiFpnEyYxIYoNswEakIZu2Bgk
DRIBmHaOd7YdLiPnJ6LDMGqOhmTvCJGmp3Z6HHl2+CBv1xReR3jvASGqEGr9pqWXvWjQfbiq
MWmrgl2vCBCPiOQZP3DRD0GtWIsUN8kGIgISoQLdkhPsdrRssxcmYX6YFC8nHCSHOSDksJA0
kwaNNpZG5j0+ggM0bfMSI7RYmAzmqFNLQ0eZqQLhHvhhOVeiqJRmQrFjJCRTrnAw2AY+iKs4
tMBX309Ygie2MmGA2sXC9/a+ZrDehsGywgIjWcGRKv5b5NGyaoEsY6cYNH8O7wGLfA9UVKnD
CZpnjVopEpQkvkF0DEunCglNG025kgtGJC0SaK4pmX+xH1p9xZdaWboDkGxiDNiEAtVZyYxi
sLEDwtGOLJMvy2VJlywUqFiKwYc+MYRrZlhhlZGP1oSVkDRh3Lq2SOIkMEKAbbNwrcViyuyI
uIgzmIpnSHRsPRpASSqCKC9SVKTVmRoR6aMXta9iKBz08pviu0SlZgkFmkLoRuu0NM7ohgw7
YAWdDqhpFE6ADDpqhx6CAMLoAsKzuPBpEJw8IRETlsZJIyfknlQ29fVBSzvZywkhKuGqkEVH
HqPRYV5caqLIdOJ06shu2dbhN4ilWKRCd8HsviBQCByHEknEOyF6+L+PXS7HfLm6URlM03l6
P5BogTImumgUxIZhEEEGAl2JTTmms7EgO9kWtuVUWJ2onoTcxsImBmKFSanG7HajIdhihsEk
QNJtKn96hVJowO1VB1AWjQ2IaISPUVUJHKxxJpmqb9+SkSA0oFMgwR+r96FV34xsuoSMrGvn
wil6uahZQnSYDihBrzIROZsCg+eSWQ9N8t7Ic4hwLSClUakBQBvKKrjDYcEN5d+ITd3A6zHH
M4LatiXmS1hI6dk7DE6Q98nHAIC9rUl91QqEwOrF8Q5beIdzE1hkWh8CImuJl5eHlN6k2I5l
iDw14Sc+SDATegaCGcgULJBDm3UKJlEqkjJzxUwiALrb0NwTZEuXoJ5FZ2szHRktICR7GICU
sLhjna3RVjI1jNXwKi4WWUWxiK0KDkIUEQxLtEmGGEZGXdtMbgbxM7QFnDeASgdkO8O7ASE6
jOZ9NoXTzxlhijEU58KoSOQFD6V8yWarI7DHAVDpR3/B6aENFAj1rsFTERAQQQGDJAc4gPZz
Q6s2UVXqYSjECJJ1pwClsA6QoijqD2dTR8KlGQvwnv8/caID1q8AuzBZdzvxtQE4kKuC5JFv
cLWTaI2hmVWiYw4YIIFjGyM+wMdiPZ+p2fcdhhJ8thIn0CrB6Ch1iDTYqtMOoE/g99Tlg27E
EAwI3ggi+b7hNoGwBnf4Ifg3bW9lKZhLEBS/4rdoPZvDYjFiVGg5HIORBIBWonoZs8TO3wWC
79g0j6cMBLXrz/EJIGLciW9zWjMLQGpzdlddE6ic6JYFghzjrsiAkoIxNmGasKWCQe3ko9Hk
LwGD7umCpJkBCsKQYEQYCWyLOnUg320NFGhpAaemTTXuaYVJwGYzDdwk7YCindA+w3unuiEX
qX77drvOicmWyZMmpUuGPQnpaQl1QN+AcBIu4nUzIXSKyFCEKbRpWpFdrs8gAPeeUYbWI/+L
uSKcKEhN/QKDgA==

--opJtzjQTFsWo+cga--