[Po4a-devel] parsing asciidoc styles

D. Barbier bouzim at gmail.com
Thu Sep 27 15:35:51 UTC 2012


On 2012/9/27 Anders Nawroth wrote:
> Ah, the correct term for all of this is an AttributeList, see:
> http://www.methods.co.nz/asciidoc/userguide.html#X21
> Which then has very different uses ...
>
> The question is if we shouldn't just support anything in there and
> make everything between the [] a no-wrap string to be translated?
> Maybe keeping some specific support for verse and quote et al which
> may benefit from that.
> It just seems quite hard to perfectly parse all the ways
> AttributeLists can be used in AsciiDoc.
> (we don't want to re-implement AsciiDoc, right! ;-)
>
> Keeping the no-extra-arguments built-ins as non-translatable string,
> just like now.
>
> WDYT?

My understanding so far is that there are two distinct AttributeList
usages: inline and block.  Inline text is part of msgid, so there is
nothing special.  The most important usage is when it is a block
element.  In that case, yes the text enclosed between brackets must
not be wrapped.
How do we detect block AttributeList?  I believe that
   m/^\[.*\]\s*$/
is quite sane. Asciidoc parser becomes havoc anyway if there are
brackets inside brackets, try for instance (my mailer wraps the first
line, it must of course not be wrapped)

  [verse, "William Blake", "from Auguries of Innocence"]and
another[verse, "William Blake", "from Auguries of Innocence"]
  To see a world in a grain of sand,
  And a heaven in a wild flower,
  Hold infinity in the palm of your hand,
  And eternity in an hour.

The only special case is with 'verse', because subsequent paragraph
must not be wrapped.  To handle quotes, we can use
        } elsif ($asciidoc and not defined $self->{verbatim} and
                 ($line =~ m/^\[(['"]?)(verse|quote)\1, +(.*)\]$/)) {
and increment $1 and $2 by 1.

So good so far.  But if you look at the code just below, it contains
special cases to handle icons and captions, to present only
translatable text in PO files, without extra markup.  If we want to
get that far, this is doable but requires more work.  If we only want
to write bracketed expressions in msgid, this is pretty trivial and
can be done very quickly (and in that case I will remove the code to
deal with icons/caption).
To summarize, there are at least 3 options:
  1. Copy [] block attribute lists in msgid as is
  2. Extract first positional parameter; this is the msgid type;
everything else is put into POT file as attributes on a single line
(this is what is currently implemented with verse and quote only)
  3. Fully parse attribute lists to handle positional and named
parameters; add a mechanism to let users say exactly which arguments
are translatable, as is done in the TeX module.  Attributes are
splitted (and unquoted) and each translatable attribute is written as
a single msgid.

I believe that 3 is the way to go.

(Note to myself: we have to support // comments so that they can be
used to give directives to translators).

Denis



More information about the Po4a-devel mailing list