[Po4a-devel]Some comments
Jordi Vilalta
jvprat@wanadoo.es
Fri, 4 Jun 2004 13:45:31 +0200 (CEST)
Hello,
On Mon, 24 May 2004, Martin Quinson wrote:
> [...]
> On Fri, May 07, 2004 at 03:43:38PM +0200, Jordi Vilalta wrote:
> > > po4a skips the generation of msgid containing an entity only (or tags only).
> > > It will now issue a warning when such optimizations are done. Thanks for the
> > > repport. [At least this is what I planned, but the msgid containing spaces
> > > along with entities where not detected. This is also fixed]
> >
> > Now it seems to skip this kind of msgids (the version I tried some days
> > ago didn't), but it has an irregular behavior. I've done the following
> > (meaningless) test:
>
> When I redo the test, I got something corresponding to what I expect:
> ====[/tmp/a]====
> <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
> <!ENTITY chap SYSTEM "chapter1.xml">
> <!ENTITY chap2 SYSTEM "chapter2.xml">
> <!ENTITY aaa "contens of aaa">
> <!ENTITY bbb "contens of bbb">
> <!ENTITY ccc "contens of ccc">
> ]>
>
> <book>
> &chap0;
> &chap;
> &chap2;
> &aaa;
> &chap3;
> &bbb;
> &chap;
> &ccc;
> &aaa;
> </book>
> ====[/tmp/chapter1.xml]====
> [content of chapt1]
> ====[/tmp/chapter2.xml]====
> [content of chapt2]
> ====[generated po file]====
> # SOME DESCRIPTIVE TITLE
> # Copyright (C) YEAR Free Software Foundation, Inc.
> # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
> #
> #, fuzzy
> msgid ""
> msgstr ""
> "Project-Id-Version: PACKAGE VERSION\n"
> "POT-Creation-Date: 2004-05-24 14:10-0700\n"
> "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
> "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
> "Language-Team: LANGUAGE <LL@li.org>\n"
> "MIME-Version: 1.0\n"
> "Content-Type: text/plain; charset=CHARSET\n"
> "Content-Transfer-Encoding: ENCODING"
>
> # type: definition of entity &aaa;
> #, no-wrap
> msgid "contens of aaa"
> msgstr ""
>
> # type: definition of entity &bbb;
> #, no-wrap
> msgid "contens of bbb"
> msgstr ""
>
> # type: definition of entity &ccc;
> #, no-wrap
> msgid "contens of ccc"
> msgstr ""
>
> # type: <book></book>
> msgid ""
> "&chap0; [content of chapt1] [content of chapt2] &aaa; &chap3; &bbb; [content "
> "of chapt1] &ccc; &aaa;"
> msgstr ""
> ====[end of files]====
>
> The type line looks ok to me, and there is no reference line for entity
> definition. That way, it is not broken ;)
Well, the problem here was with the chapter?.xml files. With your files I
get the same result as you, but when changing their content to:
<chapter><title>ch.1</title>
<para>content 1</para>
</chapter>
I get this (mad) output po file:
...
# type: <title></title>
#: a.xml:12 chapter2.xml:1
msgid "ch.1"
msgstr ""
# type: <para></para>
#: a.xml:12 chapter2.xml:1
msgid "content 1"
msgstr ""
# type: <title></title>
#: chapter1.xml:1
msgid "ch.2"
msgstr ""
# type: <para></para>
#: chapter1.xml:1
msgid "content 2"
msgstr ""
# type: </chapter><chapter>
#: chapter2.xml:1
msgid "&aaa; &chap3; &bbb;"
msgstr ""
# type: </chapter></book>
msgid "&ccc; &aaa;"
msgstr ""
It seems that when inserting the content of the included file, it's parsed
in the main file, and it gets this behavior (and the wrong type lines).
Also, I don't like the substitution of the content here:
"&chap0; [content of chapt1] [content of chapt2] &aaa; &chap3; &bbb; [content "
"of chapt1] &ccc; &aaa;"
As you see, the content of chapter1 appears twice (must be translated
twice). Instead of this, I think that inclusion entities should be treated
like the substitution entities (the content is translated once, and their
appearances should be left as they are): &aaa; appears twice in this
msgid, and its content is only translated once.
Now I've still tried to complicate it a little more. I've tried to put
some tags into a substitution entity (I've used it in real documents) and
then, the entity disappears from the generated po.
>
> > When watching the contens of the msgids, it seems that it skips only the
> > inclusion entities that it knows, and gives the "substitution" entities
> > up:
>
> No, we substitute only inclusion entities, and never the substitution ones.
> This is exaclty what I wanted, since expending them would force the
> translator to update his work each time the &version; entity is updated,
> which is exaclty contrary to the philosophy of this mecanism.
>
> > I think there are 2 alternative ways to treat these cases better:
> > 1) Exclude all entities-only messages (any number, known or unknown)
> > 2) Include the whole messages that have more than 1 entity (known or
> > unknown), because in some languages it may be interesting to change
> > the order of some of them.
>
> As reflected by the source code, the second option is the selected one.
> For the argument you give ;)
>
> > hmmm, now I was thinking about the standard entities that define special
> > characters, as ´ and I've seen that they're also excluded if there's
> > something like <title>&Acute;</title>. Seeing this, I prefer not to
> > exclude any entities. In some cases it can be a little annoying for the
> > translators, but else, there could be some untranslateable strings.
>
> hmm. This example looks a bit artificial, doesn't it? Anyway. I added a
> 'include-all' option to the module to disable those optimisations.
>
> Passing options to modules are one of the novelty introduced to the CVS
> version. For example, it would be :
> po4a-gettextize -t sgml -o include-all -m bla.sgml -p bla.pot
Interesting :)
[...]
Regards,
Jordi Vilalta