[Po4a-devel]Encoding options

Denis Barbier barbier@linuxfr.org
Wed, 4 Aug 2004 00:29:54 +0200


On Tue, Aug 03, 2004 at 11:46:50PM +0200, Jordi Vilalta wrote:
[...]
> >PO files declare their encoding in their header field, so this option
> >is only relevant when generating PO files.  This is done only once, and
> >the one who generates PO files in this context does often not know which
> >encoding is used, so it is better to have an invalid charset as in
> > "Content-Type: text/plain; charset=CHARSET\n"
> >and let translators puts the right value.
> 
> Ok, good point of view. The other part is the charset of the msgids, 
> because they have to be converted from the master file's charset to 
> something. What should it be?

By default, ASCII, and UTF-8 if they contain non-ASCII characters.
IIRC xgettext only accepts --from-code=UTF-8 as other values do not make
sense because the encoding stored in the POT file must be compatible
with all other encodings.

> >>po4a-translate doesn't have the option to select the localized file
> >>charset. Should we put it? Or may it take it from the po file?
> >>
> >>What should the defaults point to? iso-8859-1? utf-8? Something else?
> >
> >ISO-8859-1 is culturally biased, the only choices are UTF-8 and the
> >charset of the PO file, so a --encoding=po|utf8 should be sufficient.
> >I have no opinion about the default value.
> 
> Well, here I meant (for example) when gettextizing, if no charset is 
> specified for the master file in command line, and the format module 
> cannot determine which encoding is it using, we should convert it from 
> something (the default?) to the po file encoding.
> 
> Uhmmm, now I've thought that maybe there should be no recoding between the 
> master file and the po msgids. Am I right? I think I need some sleep :P

It may be so, I do not follow you ;)
You were talking about po4a-translate and localized file charset, and
now gettextizing master file.  In the latter case, if master file
contains only ASCII, no conversion is performed.  Otherwise it has to be
recoded into UTF-8, and there is indeed a problem if original charset is
not specified.  One could check whether it is UTF-8, and goes back to
ISO-8859-1 otherwise, but unspecified encodings really suck, so let's
be pedantic and force those people to declare their encoding.  After
all they know the encoding used in their English documentation, so they
can add the right options to po4a tools.

Denis