[Po4a-devel]HTML translating
Martin Quinson
martin.quinson@imag.fr
Wed, 10 Nov 2004 22:41:27 +0100
--ibTvN161/egqYuK8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Wed, Nov 10, 2004 at 05:36:29PM +0000, Yves Rutschle wrote:
> On Mon, Nov 08, 2004 at 02:22:42PM +0100, Martin Quinson wrote:
> > Ok. I wanted to reply this message the way it desserve (with a long
> > argumentation to base my point)
>=20
> Thank you for sharing your experience; I'm getting convinced
> now.
Thansk for your patience. I'll have to be even shorter tonight...
> [splitting in HTML blocs]
> > > That's actually fairly easily achievable: the list of
> > > paragraph-marking tags is fairly small (<p>, <div>,
> > > <h1,2,3,4,...>) and XHTML makes it mandatory for text to be
> > > included in a block-level element of some sort.
> >=20
> > You thus have to show some formating tags to the translators. We do so =
in
> > all other modules. I don't see any better idea.
>=20
> Ok. Well, I'm afraid that means I'm gonna have to ditch the
> current Html.pm and redo one from scratch (bar a couple of
> routines that may be recued).
I see three solutions to implement a Html module:
- pretend html is a xml dialect (xhtml is), and use Jordi's parser.=20
It should be about 20 lines long. See the Guide module for an example.
- pretend html is a sgml dialect, and use the sgml module for that. It
will work if all html pages begin with a prolog stating the dtd. It
should be the case, isn't it ?
Then you have to list all tags in the relevant lists around line 400 of
Sgml.pm. Just add a " } elsif ($prolog =3D~ /html/i) {" block, and
do the same than for other DTDs.
- recognize html is uniq. You have to implement a whole new module in that
case. You may well want to check how we did it for the sgml and xml
modules. The best may be to translate a file with both of them, or so.
> This is a <a
> href=3D"blahblah.com/this/that/blah.html">link</a> to <img src=3D"blahbla=
h.com/this/that/blah.png" alt=3D"blah" title=3D"Blah">
>=20
> [doesn't] belongs to a PO.
>=20
> So I'd propose to collapse the inside of long inline tags,
> so as to simply state there is a tag (e.g. "you're in a
> link") without detailing what the tag contains. Thus, the
> example line would appear, in the PO, as:
>=20
> This is a <a>link</a> to <img>blah</img>
I'm not fond of this because if the translator wants/have to reordonate the
links, you'll have trouble. Check the gettext info file, in the section
explaining what "%2$s" is good for. It's not impossible, but you have to
deal with it.
> [HTML::Parser vs Jordi's XML parser]
> > Moreover, I'd be pleased to cut a dependency. I hate unjustified
> > dependencies, but it may be personal.
>=20
> Me too, but I hate reimplementation of code (reinventing the
> wheel) more.
Then, that's an argument of pretending that html is xml or sgml and not
reimplement any specif po4a module :)
Ok, I'm sorry, this mail really should be longer, but I'm out of time, man.
Mt.
--ibTvN161/egqYuK8
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFBkosGSJAMsfOxudIRAipGAJ9LkIcazJFluWvEZMbj5DvIh/el+QCfYNKS
1LpVcW3i/13DAOkT7cGMMB4=
=Ntq8
-----END PGP SIGNATURE-----
--ibTvN161/egqYuK8--