[Po4a-devel]Some comments

Martin Quinson mquinson@ens-lyon.fr
Mon, 24 May 2004 15:19:21 -0700


--huq684BweRXVnRxX
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Ok, I found that mail back. It was ways too long and I didn't saw that the
missing files weren't the only issue... Thanks for reminder.

For sake of clarity, I snipped all parts speaking of stuff working now.

On Fri, May 07, 2004 at 03:43:38PM +0200, Jordi Vilalta wrote:
> > po4a skips the generation of msgid containing an entity only (or tags o=
nly).
> > It will now issue a warning when such optimizations are done. Thanks fo=
r the
> > repport. [At least this is what I planned, but the msgid containing spa=
ces
> > along with entities where not detected. This is also fixed]
>=20
> Now it seems to skip this kind of msgids (the version I tried some days=
=20
> ago didn't), but it has an irregular behavior. I've done the following=20
> (meaningless) test:

When I redo the test, I got something corresponding to what I expect:
=3D=3D=3D=3D[/tmp/a]=3D=3D=3D=3D
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" [
<!ENTITY chap SYSTEM "chapter1.xml">
<!ENTITY chap2 SYSTEM "chapter2.xml">
<!ENTITY aaa "contens of aaa">
<!ENTITY bbb "contens of bbb">
<!ENTITY ccc "contens of ccc">
]>

<book>
        &chap0;
        &chap;
        &chap2;
        &aaa;
        &chap3;
        &bbb;
        &chap;
        &ccc;
        &aaa;
</book>
=3D=3D=3D=3D[/tmp/chapter1.xml]=3D=3D=3D=3D
[content of chapt1]
=3D=3D=3D=3D[/tmp/chapter2.xml]=3D=3D=3D=3D
[content of chapt2]
=3D=3D=3D=3D[generated po file]=3D=3D=3D=3D
# SOME DESCRIPTIVE TITLE
# Copyright (C) YEAR Free Software Foundation, Inc.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#=20
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2004-05-24 14:10-0700\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=3DCHARSET\n"
"Content-Transfer-Encoding: ENCODING"

# type: definition of entity &aaa;
#, no-wrap
msgid "contens of aaa"
msgstr ""

# type: definition of entity &bbb;
#, no-wrap
msgid "contens of bbb"
msgstr ""

# type: definition of entity &ccc;
#, no-wrap
msgid "contens of ccc"
msgstr ""

# type: <book></book>
msgid ""
"&chap0; [content of chapt1] [content of chapt2] &aaa; &chap3; &bbb; [conte=
nt "
"of chapt1] &ccc; &aaa;"
msgstr ""
=3D=3D=3D=3D[end of files]=3D=3D=3D=3D

The type line looks ok to me, and there is no reference line for entity
definition. That way, it is not broken ;)

> When watching the contens of the msgids, it seems that it skips only the=
=20
> inclusion entities that it knows, and gives the "substitution" entities=
=20
> up:

No, we substitute only inclusion entities, and never the substitution ones.
This is exaclty what I wanted, since expending them would force the
translator to update his work each time the &version; entity is updated,
which is exaclty contrary to the philosophy of this mecanism.

> I think there are 2 alternative ways to treat these cases better:
>   1) Exclude all entities-only messages (any number, known or unknown)
>   2) Include the whole messages that have more than 1 entity (known or=20
>      unknown), because in some languages it may be interesting to change=
=20
>      the order of some of them.

As reflected by the source code, the second option is the selected one.
For the argument you give ;)

> hmmm, now I was thinking about the standard entities that define special=
=20
> characters, as &acute; and I've seen that they're also excluded if there'=
s=20
> something like <title>&Acute;</title>. Seeing this, I prefer not to=20
> exclude any entities. In some cases it can be a little annoying for the=
=20
> translators, but else, there could be some untranslateable strings.

hmm. This example looks a bit artificial, doesn't it? Anyway. I added a
'include-all' option to the module to disable those optimisations.=20

Passing options to modules are one of the novelty introduced to the CVS
version. For example, it would be :
po4a-gettextize -t sgml -o include-all -m bla.sgml -p bla.pot

> > > <!ENTITY % common SYSTEM "common.ent">
> > > %common;
> > > ]>
> > >=20
> > > I don't know if it's somewhat strange. The DocBook parser accepts it.=
=20
> > > The common.ent file has only entities, which are extended in the main=
=20
> > > document definition.
> >=20
> > If that's legal (ie, if nsgmls accepts it, I'll have to accept it. Agai=
n,
> > please fill a bug about this (another one). Do not forget to attach a
> > example file being valid, but refused by po4a. For example, are you sure
> > that it's %common; and not &common; ?
>=20
> nsgmls accepts them, and it parses the included files (it has given some=
=20
> errors in the included files ;)
> The %something; entities are a different kind. I don't know all their=20
> properties, but they're expandable into the doctype header (and standard=
=20
> entities aren't). I'll fill the bug in a while.

Yeah, please do so, I don't have the time/motivation to handle this one tod=
ay.

I'll try to think about commiting all my changes soon.

Bye, Mt.

--=20
Freedom is not free.

--huq684BweRXVnRxX
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAsnToIiC/MeFF8zQRApoLAJsGr+LpRbZOBWil9j0mkik+VfhsdwCgluqi
acUlL0UovfBRXS9INlU0r60=
=2peL
-----END PGP SIGNATURE-----

--huq684BweRXVnRxX--