[Po4a-devel]HTML translating

Martin Quinson martin.quinson@imag.fr
Wed, 10 Nov 2004 22:41:27 +0100


--ibTvN161/egqYuK8
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Nov 10, 2004 at 05:36:29PM +0000, Yves Rutschle wrote:
> On Mon, Nov 08, 2004 at 02:22:42PM +0100, Martin Quinson wrote:
> > Ok. I wanted to reply this message the way it desserve (with a long
> > argumentation to base my point)
>=20
> Thank you for sharing your experience; I'm getting convinced
> now.

Thansk for your patience. I'll have to be even shorter tonight...

> [splitting in HTML blocs]
> > > That's actually fairly easily achievable: the list of
> > > paragraph-marking tags is fairly small (<p>, <div>,
> > > <h1,2,3,4,...>) and XHTML makes it mandatory for text to be
> > > included in a block-level element of some sort.
> >=20
> > You thus have to show some formating tags to the translators. We do so =
in
> > all other modules. I don't see any better idea.
>=20
> Ok. Well, I'm afraid that means I'm gonna have to ditch the
> current Html.pm and redo one from scratch (bar a couple of
> routines that may be recued).

I see three solutions to implement a Html module:
  - pretend html is a xml dialect (xhtml is), and use Jordi's parser.=20
    It should be about 20 lines long. See the Guide module for an example.
  - pretend html is a sgml dialect, and use the sgml module for that. It
    will work if all html pages begin with a prolog stating the dtd. It
    should be the case, isn't it ?
    Then you have to list all tags in the relevant lists around line 400 of
    Sgml.pm. Just add a "    } elsif ($prolog =3D~ /html/i) {" block, and
    do the same than for other DTDs.
  - recognize html is uniq. You have to implement a whole new module in that
    case. You may well want to check how we did it for the sgml and xml
    modules. The best may be to translate a file with both of them, or so.

> This is a <a
> href=3D"blahblah.com/this/that/blah.html">link</a> to <img src=3D"blahbla=
h.com/this/that/blah.png" alt=3D"blah" title=3D"Blah">
>=20
> [doesn't] belongs to a PO.
>=20
> So I'd propose to collapse the inside of long inline tags,
> so as to simply state there is a tag (e.g. "you're in a
> link") without detailing what the tag contains. Thus, the
> example line would appear, in the PO, as:
>=20
> This is a <a>link</a> to <img>blah</img>

I'm not fond of this because if the translator wants/have to reordonate the
links, you'll have trouble. Check the gettext info file, in the section
explaining what "%2$s" is good for. It's not impossible, but you have to
deal with it.

> [HTML::Parser vs Jordi's XML parser]
> > Moreover, I'd be pleased to cut a dependency. I hate unjustified
> > dependencies, but it may be personal.
>=20
> Me too, but I hate reimplementation of code (reinventing the
> wheel) more.

Then, that's an argument of pretending that html is xml or sgml and not
reimplement any specif po4a module :)


Ok, I'm sorry, this mail really should be longer, but I'm out of time, man.

Mt.

--ibTvN161/egqYuK8
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBkosGSJAMsfOxudIRAipGAJ9LkIcazJFluWvEZMbj5DvIh/el+QCfYNKS
1LpVcW3i/13DAOkT7cGMMB4=
=Ntq8
-----END PGP SIGNATURE-----

--ibTvN161/egqYuK8--