[Po4a-devel]Non breaking spaces in man pages

Nicolas François nicolas.francois@centraliens.net
Fri, 18 Feb 2005 00:05:59 +0100


On Thu, Feb 17, 2005 at 12:43:39AM +0100, Jordi Vilalta wrote:
> Is there a way to enter non-breaking spaces from the keyboard? Or did t=
hey 
> use some special software that diferentiates both kinds of spaces?

Sometimes (on my box, it depends on the locale), 'Alt-Space' works. On
other systems, I've also seen some other combinations. With vim, you can use
a digraph: 'Ctrl-K' followed by 'NS'.
acheck may also be used to generate them in French.

I'm not using any PO editor (my fingers refuse using anything else than
vim). In vim, 0xA0 can be differentiated (the default behavior depends on
the locale, on a ISO-8859-1 locale I think it is displayed as '| ', in
blue).

> >What I propose is to keep the conversion of 0xAO to '\ ' in post_trans=
,
> >but remove the opposite conversion in pre_trans. Thus PO will be valid=
 and
> >translators will be able (at their will) to use 0xA0 in the msgid (and
> >will have to set a correct charset in the header).
> 
> If I understood it well, it would be compatible with existent po files,=
 
> but the newly created files would have "\ " instead of 0xA0? (I would l=
ike 
> this approach)

A sort of compatibility. It should fuzzy strings. Not a big deal I think.

> >Do you think we may keep the 0xA0 if the user specified an
> >$self->{TT}{'file_in_charset'} = UTF-8 or latin-1
> >(should we then check in_charset or out_charset ?)
> 
> It's the first time I see 0xA0, so I don't know many things about it. I=
 
> see it like a strange character, hard to diferentiate from the standard=
 
> space in classic editors (correct me if I'm wrong). Personally I prefer=
 
> having "\ " everywhere instead of 0xA0, independent from the character 
> set.

Since it generate errors, I would also advocate for "\ ".

> >I'm also asking this for the TeX module (there I'm doing translation o=
f
> >accentuated characters, i.e. \'e in the TeX file becomes é in the PO=
 which
> >is then translated again to \'e in the TeX file).
> 
> This is a different case because it's easy to diferentiate a 'e' from a=
 
> 'é' (both visually when reading and when writing each one).
> 
> Apart of this, I'll try to have a look at this conversion, because this=
 
> introduces undetected non-ascii characters (which should force the po f=
ile 
> to be in utf-8).

Do you mean that we could force the PO becoming UTF-8 when these
conversions are performed?

'é' will also cause an error.

Supporting such translations could be useful for other languages (e.g.
XML, with é and other characters).

Regards,
-- 
Nekral