[Debburn-devel] What can I assume about libc?

Peter Samuelson peter at p12n.org
Wed Oct 4 22:29:09 UTC 2006


[Lorenz Minder]
> I had a look.  On Win32, wchar_t is actually 16 bit, so there's
> little risk of getting 4-byte chars.

Since Windows uses UTF-16, there is certainly a theoretical possibility
of having to handle 4-byte characters.  I have to admit I don't know
how Windows deals with this given a 2-byte wchar_t, but I assume it
just splits such a character into two wchar_ts, as though it were
simple UCS-2.

For the purpose of UTF-8 conversion, treating two halves of a 32-bit
character as two separate characters will spectacularly do the wrong
thing.

> >Custom functions may be best:
> > 
> > utf8_to_u32
> > u32_to_utf8
> > u32_to_utf16
> > utf16_to_u32
> 
> Or we can just use libiconv instead for this purpose, which
> apparently also exists for Windows.

Well, those 4 functions are utterly trivial to write, much easier than
dealing with iconv (and its platform availability) - _if_ we don't have
to fully validate our input.  Completely validating a stream of Unicode
(be it UTF-8 or UTF-16) is a whole other story.  I'm not certain
whether even iconv bothers to do _that_.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debburn-devel/attachments/20061004/3f6c0672/attachment.pgp


More information about the Debburn-devel mailing list