[sane-standard] sane standard proposals (5) "character encoding"

Henning Meier-Geinitz henning@meier-geinitz.de
Mon, 11 Oct 2004 18:59:16 +0200


Hi,

On Mon, Oct 11, 2004 at 02:41:34AM +0200, Johannes Berg wrote:
> UTF-8 format
> All texts and translations: UTF-8 format (this is used in KDE and gtk
> +-2.x) may be UTF-8 should be forced as SANE_Char format. Currently
> ISO-8859-1 is used as encoding.

UTF-8 seems to be the only encoding that is used widely anyway.
 
> while the standard specifies:
> Type SANE_String represents a text string as a sequence of C char
> values. The end of the sequence is indicated by a '\0' (NUL) character.
> 
> The latter is inconsistent with the definition (it should reas 'as a
> sequence of SANE_Char values', I think);

Probably.

> but using UCS2 or UCS4 would
> have the disadvantage that using a single "\0" as a terminator is no
> longer possible since \0 may occur in a valid UCS2 stream (and will,
> when you write ASCII).

And that means you can't use C functions to handle such strings. 

> Also this bloats the transferred strings unnecessarily since all texts
> would be ASCII anyway (since they're English, and translation is only
> done in the frontend).

Even English texts could contain non-ASCII characters.

I think UTF-8 is ok as it is a superset of ASCII anyway.

Bye,
  Henning