[Debburn-devel] What can I assume about libc?

Peter Samuelson peter at p12n.org
Thu Oct 5 01:18:21 UTC 2006


[Lorenz Minder]
> But doing such a thing as mapping a character to two wchar_ts would
> subvert the very purpose of wchar_t, namely to map each "basic
> element" to a _single_ wchar_t.

You'd think so, wouldn't you?

> >as though it were simple UCS-2.
> 
> Ok, I don't know what UCS-2 is.  A quick google search tells me that
> "UCS-2 is a fixed-length (16 bits) subset of UTF-16, able to represent
> the basic multilingual plane only."

That's correct.  UCS-2 can only represent the BMP (U+0000 - U+FFFF).
UTF-16 is identical to UCS-2 for the BMP, but also specifies a 4-byte
way to represent the rest of Unicode, U+10000 - U+10FFFF.  Note also
that each half of a 4-byte UTF-16 character is illegal in UCS-2, so
anything that sufficiently validates its input won't be confused even
if it assumes UCS-2.

I think the story is this: NT4 uses UCS-2 (specifically little-endian).
Windows 2000 adds support for UTF-16 (UTF-16LE).  wchar_t is a relic
from NT3/NT4 which can't be changed now without breaking the Win32 ABI.

> The list above lacks u32_to_lc_ctype, though, which is also needed.
> I gather that would be equally trivial to do?

If we need LC_CTYPE, we can use wcstombs() and mbstowcs(), which are
C99 functions that use wchar_t.  However, I don't know whether anything
can be assumed about the structure of wchar_t, such that it can be used
non-opaquely by functions outside libc itself.  It'd be nice if we
could assume wchar_t is really native-endian UCS-2 or UCS-4 (depending
on the size of the type), but I hesitate to do that.  Also, I don't
know the availability of these functions on semi-modern platforms.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debburn-devel/attachments/20061004/148b1b86/attachment.pgp


More information about the Debburn-devel mailing list