[Po4a-devel]Sgml bug in the tracker
Martin Quinson
martin.quinson@loria.fr
Tue, 10 May 2005 20:31:25 +0200
--B4IIlcmfBL/1gGOG
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Sorry for the delay, I was playing with shadow ;)
On Mon, May 09, 2005 at 01:59:09AM +0200, Nicolas Fran=E7ois wrote:
> Hello,
>=20
> There is a bug reported on the Alioth tracker against the Sgml module.
>=20
> I did not notice it before.
> Was there a notification on po4a-devel@lists.alioth?
It should have been. The bug tracker is configured to send everything to
po4a-devel@lists.alioth.debian.org
See
https://alioth.debian.org/tracker/admin/index.php?group_id=3D30267&atid=3D4=
10622&update_type=3D1
> Otherwise, is there a way to get some notifications from the tracker?
>=20
>=20
> Then regarding the bug report:
> * I've already uploaded a simple fix for a typo reported in the bug
> report.
> * the SGML book uses a contrib and epigraph tag. Are those tags
> standards? Can I add them to the translate category?
I dunno ; please do so. If it helps for this document, it's good. There's
almost no change that it break anything.
> * for the main part of the bug report, I propose to escape '<', '>' and
> '&' to {PO4A-lt}, {PO4A-gt} and {PO4A-amp} before feeding nsgmls. And
> changing them back to the original in the cdata type.
Great, that's what we have to do.
> I also had some other issues with this PHP book:
> * around line 795, PO4A-beg/end are changed back to there SGML
> counterparts only if they appear at the beginning of a line.
> Why only at the beginning?
I can't remember. That's a *long* time that I didn't dig into sgml.pm
anymore. And I keep bad remembering about this. The code is a bit obscure,
and there is a bunch of stuff we should move to TransTractor (file
inclusion) or do another way (I dream of killing nsgml).
> This cause some PO4A-beg/end to be kept in the output document.
If so, this is a bug ;)
> * also, the content of the cdata is pushed, but the buffer is not
> flushed, so it can be pushed too early.
> In my patch, I appended the content of the cdata to $buffer.
> Should the content of cdata be verbatim? shouldn't it be translated?
I think it should be verbatim. I'm not sure anymore about translation.
> * also, I don't really understand what is done with the leading spaces
> and the added trailing '\n', but this is probably not an issue.
What I absolutely want to avoid here is getting the whole document on only
one line since it kills any dream of addendum. So, I try to get one
structuring tag per line, and to add some spaces around to make this look
better. But this code also can be bugged...
=20
> * around line 535, & is changed to {PO4A-amp} if it is not the beginning
> of an entity.
> This uses:
> while ($origfile =3D~ /^(.*?)&([^;\s]*);(.*)$s/) {
> ...
> }
> this regex is too permissive. This cause the following line:
> ]]><![CDATA[&d_op=3Dviewdownload&cid=3D79\">Web Installer...
> being changed in:
> ]]><![CDATA[_op=3Dviewdownload=3D79\">Web Installer...
>=20
> I found the following grammar (for XML):
> http://www.w3.org/TR/REC-xml/#NT-Name
> It's probably too complicated (the Letter or Digit rules use a lot of
> Unicode chars). So I propose to only allow ASCII chars (with a non
> greedy match):
> while ($origfile =3D~ /^(.*?)&([A-Za-z_:][-_:.A-Za-z0-9]*?);(.*)$s/)=
{
> ...
> }
Ups. :-/
btw, you can make it greedy, ";" is not accepted so it won't make any
difference, will it?
> * my last point: can anybody have a look at the sgmldiff between
> EN-Book.sgml and po4a-normalize.output?
>=20
> I'm highly incompetent regarding SGML and I based my analysis on po4a and
> sgmldiff outputs. So please stop me if any of the above statement is
> wrong.
I'm rather sort on time, but I'll try to do so. The statements look good.
> Attached is the patch I plan to commit this week.
No need to wait that long ;)
Thanks again for your time,
Mt.
--B4IIlcmfBL/1gGOG
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
iD8DBQFCgP38IiC/MeFF8zQRApbvAJ9jQnj/CaO97F8wijX+NKoH67++ggCg0Ik+
q4UuPv04851BP6/iqcVSjvo=
=Jxtp
-----END PGP SIGNATURE-----
--B4IIlcmfBL/1gGOG--