[Po4a-devel]Some comments

Jordi Vilalta jvprat@wanadoo.es
Fri, 4 Jun 2004 13:45:31 +0200 (CEST)


Hello,

On Mon, 24 May 2004, Martin Quinson wrote:
> [...]
> On Fri, May 07, 2004 at 03:43:38PM +0200, Jordi Vilalta wrote:
> > > po4a skips the generation of msgid containing an entity only (or tags only).
> > > It will now issue a warning when such optimizations are done. Thanks for the
> > > repport. [At least this is what I planned, but the msgid containing spaces
> > > along with entities where not detected. This is also fixed]
> > 
> > Now it seems to skip this kind of msgids (the version I tried some days 
> > ago didn't), but it has an irregular behavior. I've done the following 
> > (meaningless) test:
> 
> When I redo the test, I got something corresponding to what I expect:
> ====[/tmp/a]====
> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
> <!ENTITY chap SYSTEM "chapter1.xml">
> <!ENTITY chap2 SYSTEM "chapter2.xml">
> <!ENTITY aaa "contens of aaa">
> <!ENTITY bbb "contens of bbb">
> <!ENTITY ccc "contens of ccc">
> ]>
> 
> <book>
>         &chap0;
>         &chap;
>         &chap2;
>         &aaa;
>         &chap3;
>         &bbb;
>         &chap;
>         &ccc;
>         &aaa;
> </book>
> ====[/tmp/chapter1.xml]====
> [content of chapt1]
> ====[/tmp/chapter2.xml]====
> [content of chapt2]
> ====[generated po file]====
> # SOME DESCRIPTIVE TITLE
> # Copyright (C) YEAR Free Software Foundation, Inc.
> # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
> # 
> #, fuzzy
> msgid ""
> msgstr ""
> "Project-Id-Version: PACKAGE VERSION\n"
> "POT-Creation-Date: 2004-05-24 14:10-0700\n"
> "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
> "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
> "Language-Team: LANGUAGE <LL@li.org>\n"
> "MIME-Version: 1.0\n"
> "Content-Type: text/plain; charset=CHARSET\n"
> "Content-Transfer-Encoding: ENCODING"
> 
> # type: definition of entity &aaa;
> #, no-wrap
> msgid "contens of aaa"
> msgstr ""
> 
> # type: definition of entity &bbb;
> #, no-wrap
> msgid "contens of bbb"
> msgstr ""
> 
> # type: definition of entity &ccc;
> #, no-wrap
> msgid "contens of ccc"
> msgstr ""
> 
> # type: <book></book>
> msgid ""
> "&chap0; [content of chapt1] [content of chapt2] &aaa; &chap3; &bbb; [content "
> "of chapt1] &ccc; &aaa;"
> msgstr ""
> ====[end of files]====
> 
> The type line looks ok to me, and there is no reference line for entity
> definition. That way, it is not broken ;)

Well, the problem here was with the chapter?.xml files. With your files I 
get the same result as you, but when changing their content to:

<chapter><title>ch.1</title>
<para>content 1</para>
</chapter>

I get this (mad) output po file:

...
# type: <title></title>
#: a.xml:12 chapter2.xml:1
msgid "ch.1"
msgstr ""

# type: <para></para>
#: a.xml:12 chapter2.xml:1
msgid "content 1"
msgstr ""

# type: <title></title>
#: chapter1.xml:1
msgid "ch.2"
msgstr ""

# type: <para></para>
#: chapter1.xml:1
msgid "content 2"
msgstr ""

# type: </chapter><chapter>
#: chapter2.xml:1
msgid "&aaa; &chap3; &bbb;"
msgstr ""

# type: </chapter></book>
msgid "&ccc; &aaa;"
msgstr ""

It seems that when inserting the content of the included file, it's parsed 
in the main file, and it gets this behavior (and the wrong type lines). 
Also, I don't like the substitution of the content here:

"&chap0; [content of chapt1] [content of chapt2] &aaa; &chap3; &bbb; [content "
"of chapt1] &ccc; &aaa;"

As you see, the content of chapter1 appears twice (must be translated 
twice). Instead of this, I think that inclusion entities should be treated 
like the substitution entities (the content is translated once, and their 
appearances should be left as they are): &aaa; appears twice in this 
msgid, and its content is only translated once.

Now I've still tried to complicate it a little more. I've tried to put 
some tags into a substitution entity (I've used it in real documents) and 
then, the entity disappears from the generated po.

> 
> > When watching the contens of the msgids, it seems that it skips only the 
> > inclusion entities that it knows, and gives the "substitution" entities 
> > up:
> 
> No, we substitute only inclusion entities, and never the substitution ones.
> This is exaclty what I wanted, since expending them would force the
> translator to update his work each time the &version; entity is updated,
> which is exaclty contrary to the philosophy of this mecanism.
> 
> > I think there are 2 alternative ways to treat these cases better:
> >   1) Exclude all entities-only messages (any number, known or unknown)
> >   2) Include the whole messages that have more than 1 entity (known or 
> >      unknown), because in some languages it may be interesting to change 
> >      the order of some of them.
> 
> As reflected by the source code, the second option is the selected one.
> For the argument you give ;)
> 
> > hmmm, now I was thinking about the standard entities that define special 
> > characters, as &acute; and I've seen that they're also excluded if there's 
> > something like <title>&Acute;</title>. Seeing this, I prefer not to 
> > exclude any entities. In some cases it can be a little annoying for the 
> > translators, but else, there could be some untranslateable strings.
> 
> hmm. This example looks a bit artificial, doesn't it? Anyway. I added a
> 'include-all' option to the module to disable those optimisations. 
> 
> Passing options to modules are one of the novelty introduced to the CVS
> version. For example, it would be :
> po4a-gettextize -t sgml -o include-all -m bla.sgml -p bla.pot

Interesting :)

[...]

Regards,

Jordi Vilalta