[Po4a-devel]Some comments

Martin Quinson mquinson@ens-lyon.fr
Fri, 4 Jun 2004 14:14:34 -0700


--egxrhndXibJAPJ54
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jun 04, 2004 at 01:45:31PM +0200, Jordi Vilalta wrote:
> Hello,
>=20
> On Mon, 24 May 2004, Martin Quinson wrote:
> > [...]
> > On Fri, May 07, 2004 at 03:43:38PM +0200, Jordi Vilalta wrote:
> > > > po4a skips the generation of msgid containing an entity only (or ta=
gs only).
> > > > It will now issue a warning when such optimizations are done. Thank=
s for the
> > > > repport. [At least this is what I planned, but the msgid containing=
 spaces
> > > > along with entities where not detected. This is also fixed]
> > >=20
> > > Now it seems to skip this kind of msgids (the version I tried some da=
ys=20
> > > ago didn't), but it has an irregular behavior. I've done the followin=
g=20
> > > (meaningless) test:
> >=20
> > When I redo the test, I got something corresponding to what I expect:
> > =3D=3D=3D=3D[/tmp/a]=3D=3D=3D=3D
> > <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> > "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
> > <!ENTITY chap SYSTEM "chapter1.xml">
> > <!ENTITY chap2 SYSTEM "chapter2.xml">
> > <!ENTITY aaa "contens of aaa">
> > <!ENTITY bbb "contens of bbb">
> > <!ENTITY ccc "contens of ccc">
> > ]>
> >=20
> > <book>
> >         &chap0;
> >         &chap;
> >         &chap2;
> >         &aaa;
> >         &chap3;
> >         &bbb;
> >         &chap;
> >         &ccc;
> >         &aaa;
> > </book>
> > =3D=3D=3D=3D[/tmp/chapter1.xml]=3D=3D=3D=3D
> > [content of chapt1]
> > =3D=3D=3D=3D[/tmp/chapter2.xml]=3D=3D=3D=3D
> > [content of chapt2]
> > =3D=3D=3D=3D[generated po file]=3D=3D=3D=3D
> > # SOME DESCRIPTIVE TITLE
> > # Copyright (C) YEAR Free Software Foundation, Inc.
> > # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
> > #=20
> > #, fuzzy
> > msgid ""
> > msgstr ""
> > "Project-Id-Version: PACKAGE VERSION\n"
> > "POT-Creation-Date: 2004-05-24 14:10-0700\n"
> > "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
> > "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
> > "Language-Team: LANGUAGE <LL@li.org>\n"
> > "MIME-Version: 1.0\n"
> > "Content-Type: text/plain; charset=3DCHARSET\n"
> > "Content-Transfer-Encoding: ENCODING"
> >=20
> > # type: definition of entity &aaa;
> > #, no-wrap
> > msgid "contens of aaa"
> > msgstr ""
> >=20
> > # type: definition of entity &bbb;
> > #, no-wrap
> > msgid "contens of bbb"
> > msgstr ""
> >=20
> > # type: definition of entity &ccc;
> > #, no-wrap
> > msgid "contens of ccc"
> > msgstr ""
> >=20
> > # type: <book></book>
> > msgid ""
> > "&chap0; [content of chapt1] [content of chapt2] &aaa; &chap3; &bbb; [c=
ontent "
> > "of chapt1] &ccc; &aaa;"
> > msgstr ""
> > =3D=3D=3D=3D[end of files]=3D=3D=3D=3D
> >=20
> > The type line looks ok to me, and there is no reference line for entity
> > definition. That way, it is not broken ;)
>=20
> Well, the problem here was with the chapter?.xml files. With your files I=
=20
> get the same result as you, but when changing their content to:
>=20
> <chapter><title>ch.1</title>
> <para>content 1</para>
> </chapter>
>=20
> I get this (mad) output po file:
>=20
> ...
> # type: <title></title>
> #: a.xml:12 chapter2.xml:1
> msgid "ch.1"
> msgstr ""
>=20
> # type: <para></para>
> #: a.xml:12 chapter2.xml:1
> msgid "content 1"
> msgstr ""
>=20
> # type: <title></title>
> #: chapter1.xml:1
> msgid "ch.2"
> msgstr ""
>=20
> # type: <para></para>
> #: chapter1.xml:1
> msgid "content 2"
> msgstr ""
>=20
> # type: </chapter><chapter>
> #: chapter2.xml:1
> msgid "&aaa; &chap3; &bbb;"
> msgstr ""
>=20
> # type: </chapter></book>
> msgid "&ccc; &aaa;"
> msgstr ""
>=20
> It seems that when inserting the content of the included file, it's parse=
d=20
> in the main file, and it gets this behavior (and the wrong type lines).=
=20

Yes, it is inlined in the main document before parsing. It is needed to take
care of the conditional inclusions. Yes, the reference is a bit wrong in
that case. There is a fuzziness of a few lines. I dunno where it comes from,
and that's #300589 on alioth.

But I don't get why you say it's a mad file. The type lines are perfectly
valid. Take "&aaa; &chap3; &bbb;", for example. It is placed out of any tag,
and ends to be between a "</chapter>" and a "<chapter>". So, that's not po4a
which is wrong with the type line, that's the file which is wrongly
formated... If you do not agree, what would be a valid file for you ?

> Also, I don't like the substitution of the content here:
>
> "&chap0; [content of chapt1] [content of chapt2] &aaa; &chap3; &bbb; [con=
tent "
> "of chapt1] &ccc; &aaa;"
>=20
> As you see, the content of chapter1 appears twice (must be translated=20
> twice). Instead of this, I think that inclusion entities should be treate=
d=20
> like the substitution entities (the content is translated once, and their=
=20
> appearances should be left as they are): &aaa; appears twice in this=20
> msgid, and its content is only translated once.

Yes, but it only comes from the fact that this example is very very (very)
artificial. When you do unsane stuff such as creating only one file
containing only a few words (ie, using a file to render the functionnality
of substitution entities) such as I did in this example, you cannot expect
to get sane results, can you ?

In regular use, each file will contain a bunch of tags, which will thus be
parsed separately. Moreover, I'd prefer not to care about people including
the same file twice in the same master document. That's insane.

> Now I've still tried to complicate it a little more. I've tried to put=20
> some tags into a substitution entity (I've used it in real documents) and=
=20
> then, the entity disappears from the generated po.

Sure, if there is only one tag per substitution entity, the optimisation
applies. If it's not the case, please come up with some example ;)

Bye, Mt.

--=20
Dans la france profonde, il y a surtout des sp=E9l=E9ologues.
   -- Le Chat

--egxrhndXibJAPJ54
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAwOY6IiC/MeFF8zQRApW/AJ0YfyhwzbMkGoxdkXCjNKRH3V+5jgCgvvn6
H4CvOkP0YuEGssO4a6y3DuQ=
=r7BK
-----END PGP SIGNATURE-----

--egxrhndXibJAPJ54--