[Po4a-devel]Broken encoding in http://po4a.alioth.debian.org/it/po4a.7.html

Fri, 13 Aug 2004 10:29:33 +0200 (CEST)

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---1463811584-2072619971-1092385773=:15150
Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Thu, 12 Aug 2004, Danilo Piazzalunga wrote:
> Alle 19:10, gioved=EC 12 agosto 2004, Martin Quinson ha scritto:
>> Hello,
>>
>> I was admiring the new translation and trying to read it (without really
>> speaking italian) when I discovered some encoding issues. It looks like
>> UTF8 chars handled by a non utf-ready tool.
>
> In the Html you can see things such as:
> "Perch&Atilde;&copy;" instead of "Perch=E9"

Yes, it seems mpod2html assumes that the input document is in iso-8859,=20
and works with the eyes closed :(

>> After some investigation, it's possible that the corruption comes from t=
he
>> po file itself.
> [...]
>> it's even possible that the po file is clean but my less is broken.
>
> The PO itself is clean. Recode doesn't complain. Try "LANG=3DC less <file=
>" and
> you will see that the PO is really UTF-8.
>
> Likely, some tool dealing with manpages expect them to use the ISO-8859-1
> charset. I already had a similar experience: one UTF-8 page looked fine w=
hen
> viewed directly (man ./foo.1), but when insalled and viewed with "man 1 f=
oo"
> it showed the same problem.

The man pages also appear with the utf codes. I've tried to install them,=
=20
and to open the file directly, and both fail.

> The files could be recoded to either ISO-8859-1 or ISO-8859-15, but the r=
eal
> problem lies elsewere.

The easiest (and fastest) solution would be (as you say) to recode the po=
=20
file to the encoding on which the final documents should be (and mantain=20
the translation with the recoded po).

But this is a temporal solution. When we have the first non-european=20
translations this problem will be back.

I think that the real issue is that the programs that deal with the=20
translated documents don't have encodings support. The binary translation=
=20
seems to work well, althought its po is also in utf (gettext rocks ;)

An intermediate solution could be to add an option to po4a-translate to=20
specify the encoding in which you want the output document to be. This=20
would be also interesting for the translation of files gettextized with=20
the po4a script (the generated po could be in utf, since it mixes files=20
of different formats and maybe different encodings) and you want some=20
output files in different encodings.

This would be very rudimentary (but necessary?) to specify an output=20
encoding for each language...

This would also take to a redesign of the po4a config files, since we need=
=20
to specify more information there. This could be done together with the=20
TransTractor redesign explained in the "Future directions" section. Well,=
=20
we should leave all this for a future release... but we should begin=20
thinking about it ;)

Regards,

Jordi Vilalta
---1463811584-2072619971-1092385773=:15150--