[sane-standard] sane standard proposals (5) "character encoding"
Henning Meier-Geinitz
henning@meier-geinitz.de
Mon, 11 Oct 2004 18:59:16 +0200
Hi,
On Mon, Oct 11, 2004 at 02:41:34AM +0200, Johannes Berg wrote:
> UTF-8 format
> All texts and translations: UTF-8 format (this is used in KDE and gtk
> +-2.x) may be UTF-8 should be forced as SANE_Char format. Currently
> ISO-8859-1 is used as encoding.
UTF-8 seems to be the only encoding that is used widely anyway.
> while the standard specifies:
> Type SANE_String represents a text string as a sequence of C char
> values. The end of the sequence is indicated by a '\0' (NUL) character.
>
> The latter is inconsistent with the definition (it should reas 'as a
> sequence of SANE_Char values', I think);
Probably.
> but using UCS2 or UCS4 would
> have the disadvantage that using a single "\0" as a terminator is no
> longer possible since \0 may occur in a valid UCS2 stream (and will,
> when you write ASCII).
And that means you can't use C functions to handle such strings.
> Also this bloats the transferred strings unnecessarily since all texts
> would be ASCII anyway (since they're English, and translation is only
> done in the frontend).
Even English texts could contain non-ASCII characters.
I think UTF-8 is ok as it is a superset of ASCII anyway.
Bye,
Henning