[Po4a-devel]Design of the (La)TeX module

Mon, 6 Dec 2004 21:47:09 +0100

Hello,

Based on my reading on LaTeX and Nicolas' book, I will change a little
bit the implementation.

It is much more formalized than the previous prototype. If it works, the
content of this mail may be used as a documentation for this module.

1) The functions:
parse:
  * The parse function will only separate paragraphs. The separator is an
    empty line or a line beginning by a comment.
    What I'm calling here a "paragraph" is not a paragraph in the output
    document, but a bloc of code separated by one of these separators.
  * The parse function will also remove the comments from this paragraph
    and keep them in a buffer (to be pushed as PO comments if there is
    a string to translate in this paragraph, or ignored otherwise).
    The comments will be ignored in the localized document.
    (This doesn't concern lines beginning by a comment, which will just be
    pushed, like empty lines)
  * Once a paragraph is found, the translation of the paragraph (built by
    translate_buffer) is pushed.

translate_buffer: return the translation of a buffer (typically a
paragraph or a subset of a paragraph)
  1) call get_leading_command, to handle a leading command
     If the paragraph begins by a command, call this command's subroutine
     with the paragraph in argument and append this translation to the
     translated buffer.
     Loop until there is no more leading command.
  2) call get_trailing_command, to handle trailing command (loop)
     while there is some trailing commands, call these commands, and build
     a translated buffer to push at the end of the current paragraph.
  3) append the translation of the remaining paragraph (if any)
  4) append the translation of the trailing commands

  * it should be possible to keep the separator between the commands
    (could be none, a space or a newline).

One question: Is this separator important? For example, can I re-wrap:
\inputprotcode
\makeindex
\debing{document}
\myeqnspacing
   into:
\inputprotcode \makeindex \begin{document} \myeqnspacing
or even
\inputprotcode\makeindex\begin{document}\myeqnspacing

parse_command:
  A subroutine for the commands subroutine and get_leading_command /
  get_trailing_command
  * take a paragraph/buffer in argument
  * output the command name, an optional * (for \chapter*{foo}), an array
    of optional argument (between []), an array of argument (between {}),
    and the remaining paragraph/buffer.

Another question: Are optional arguments always before regular arguments?

get_leading_command:
  Is probably the same as parse_command.

get_trailing_command:
  If the given paragraph ends by a command, then extract this command and
  return the command name, etc. and the remaining paragraph.
  The parameter of a command can contain a command, so a simple regular
  expression won't be sufficient.
  To be understood as a trailing command, the command will have to end by
  an argument (could be optional), or should not have any argument.

I've read that a command is a \ followed by a string of lower and/or
uppercase letters or a \ followed by a single nonletter.

I will probably only support the first case, but not the latter (with some
exception)

command subroutines:
  * commands is a hash of subroutines with the command name as a key.
  * One command subroutine takes in argument the paragraph/buffer (and
    $self,...).
    The arguments may have been separated by parse_command before
    and provided as arguments.
    outputs are: a translated string, the remaining paragraph/buffer
  * to translate the content of a parameter, translate_buffer could
    be called.
    This way, I'm dealing with \chapter{foo\label{bar}}

2) environments
  * when a \begin is encountered, push the environment name in a stack.
    This should permit to embed environments
    Do you think this may be useful to have a full stack of the
    environment?
    It may be better than always handling the paragraph (and shifting
    lines) in a separated subroutine
  * when a \end is encountered, pop this environment (some verifications
    could be done)
  * The translate_buffer should ensure that if the environment stack contains
    a verbatim, then the no-wrap flag is set, and other things like this
    (e.g. setting the type)
  * some environment may be particular, and a more complicated subroutine
    could handle the current paragraph, and possibly shift some lines.
  * the environments subroutines could be separated from the commands hash.
    Do you think they should be separated?

3) Some questions:
  * Is there some commands that need to be translated?
    For example, somebody may want to change \noindent into a
    \localized_noindent.

  * Does a line containing only spaces breaks a paragraph?

4) \newcommand

I had a look at Nicolas' book.
I think the parser explained above can do the trick for it.

The parser could receive in argument the class file (lm.cls), or the file
name could be deduced from the \documentclass{lm} command.

Every lines of this document is ignored, but lines beginning by:
"% po4a:"
(In fact, these lines could also be specified in another file, but it may
be easier to have the definitions of the commands and the definition of
how they should be dealt with by po4a)

Here is my proposal for the format of these lines:
% po4a: new_command alias other_command
or
% po4a: new_command x y z t
where
  * x is the number of optional arguments (between [])
      0 - no optional argument
     -1 - variable (can it be?)
      n - maximum number of optional argument (maybe -1 will be easier to use)
  * y is the number of arguments
    maybe x and y are not needed
  * z array of indexes of the optional arguments that have to be translated
     -1 - all optional argument should be translated
      0 - none
  1,3,7 - the 1st, 3rd and 7th arguments should be translated
  * t array of indexes of the arguments that have to be translated

5) User point of view (my goals)
  Nicolas should only have to provide the bk2.tex file to the parser (and
  a class file).
  When the \include{ch01/ch01} will be encountered, this file will be
  parsed.

  No file should require a modification (except the class file or a file
  containing only the "% po4a: " commands

6) Conclusion
  So, it virtually works in my dreams and everything seems rather clear and
  simple;)

  Some points were masked/forgotten here (notably the reference, at this
  time I only plan to keep one reference for a paragraph, but this can
  certainly be fixed latter).

Lets start coding,
-- 
Nekral