[Po4a-devel]Encoding options

Jordi Vilalta jvprat@wanadoo.es
Wed, 4 Aug 2004 14:46:41 +0200 (CEST)


Ok, let me summarize what we have said until now (thanks everyone to help 
me understand better the limitations of the po files and the objectives of 
the encodings).


Here are the conditions we have to fulfil:

- msgids and msgstrs must share the same encoding
- msgids should only be ascii or utf-8
- ascii is preferred over utf-8 by translators


And here's a proposal of the processes:

* Handling the master document (in gettextize, translate and update):

- If a charset is specified in the command-line, convert from that to
   utf-8 (and set the po charset to utf-8)
- Else, if the format module can detect the encoding from the document,
   convert from this to utf-8 (and set the po charset to utf-8)
- If nothing can determine the file encoding, assume it's in ascii and
   don't convert anything (and set the po charset to something invalid, so
   that the translator can set it)


* Handling the input translated document (in gettextize):

- If the master document's charset is ascii (not specified in the po), we
   should let the translated document remain in the specified charset (in
   the command line or the format module's detected one (if nothing
   detected, stop the process)), and set the po charset to it.
- If the master document's charset is utf-8, we should convert from the
   specified charset (in the command line or the format module's detected
   one) to utf-8.


* Handling the output translated document (in translate):

- Use the charset specified in the command line, or the po file's charset
   if nothing specified.


* Handling the addendum (in translate):

- It should be converted from the specified charset in the command line
   (mandatory) to the output document charset determined in the point
   above.


Did I miss something? Am I wrong in some points?

Oh, and one last question for now: should we recode everything or just the 
translated strings (assuming that's the only place where there can be 
encoding issues...)?

Regards,

Jordi Vilalta