[Debburn-devel] What can I assume about libc?
Peter Samuelson
peter at p12n.org
Wed Oct 4 22:29:09 UTC 2006
[Lorenz Minder]
> I had a look. On Win32, wchar_t is actually 16 bit, so there's
> little risk of getting 4-byte chars.
Since Windows uses UTF-16, there is certainly a theoretical possibility
of having to handle 4-byte characters. I have to admit I don't know
how Windows deals with this given a 2-byte wchar_t, but I assume it
just splits such a character into two wchar_ts, as though it were
simple UCS-2.
For the purpose of UTF-8 conversion, treating two halves of a 32-bit
character as two separate characters will spectacularly do the wrong
thing.
> >Custom functions may be best:
> >
> > utf8_to_u32
> > u32_to_utf8
> > u32_to_utf16
> > utf16_to_u32
>
> Or we can just use libiconv instead for this purpose, which
> apparently also exists for Windows.
Well, those 4 functions are utterly trivial to write, much easier than
dealing with iconv (and its platform availability) - _if_ we don't have
to fully validate our input. Completely validating a stream of Unicode
(be it UTF-8 or UTF-16) is a whole other story. I'm not certain
whether even iconv bothers to do _that_.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debburn-devel/attachments/20061004/3f6c0672/attachment.pgp
More information about the Debburn-devel
mailing list