[Debburn-devel] Sample mini-iconv (Was: What can I assume about libc?

Albert Cahalan acahalan at gmail.com
Thu Oct 5 16:21:23 UTC 2006


On 10/5/06, Peter Samuelson <peter at p12n.org> wrote:
>
> So, the real reason I'm replying is to fix some stupid bugs in my
> routines.  I was handling the size_t counters all wrong in two of the
> functions.  Still haven't tested my code, mind.

I just tested round-trip behavior. It appears to be unreliable.
Mind you, the test code is trash that I almost didn't even
bother to indent. I'll attach it.

> > The "len" parameter might be a hazard. It seems to be in terms of the
> > number of input characters. Usually the more important concern is the
> > number of output bytes.  People may be tempted to use 2*n to allocate
> > space for n characters when the output is UTF-16, which would allow
> > for buffer overflows. Perhaps it is best to have both input
> > characters and output bytes specified.
>
> Valid concern.  Fixed.

I guess that does it. I was thinking of something that
could work with raw sizeof, but this can be useful too.
Adding a 5th parameter might be too unwieldy.

> > Since these functions are memcpy-like (good IMHO),
> > some strlen-like functions may also be needed.
>
> Trivial ucs4_strnlen() added.

I didn't test this. Probably a utf16 one is useful too.
The utf8 one is trivial (call strlen), but perhaps good
for documentation reasons.

> > BTW, for the Windows port one might define UNICODE.
> > I believe this gets you all the OS interfaces in UTF-16.
>
> So that all functions take wchar_t* instead of char*?  That sounds a
> bit scary.  I don't pretend to understand Win32, though.

Well, not ALL functions. Just the Windows API ones.
The header files are something like this:

#define wchar_t short
char SomeFunctionA(char*);
wchar_t SomeFunctionW(wchar_t*);
#ifdef UNICODE
#define TCHAR short
#define SomeFunction SomeFunctionW
#else
#define TCHAR char
#define SomeFunction SomeFunctionA
#endif
//
// Stuff documented:
// char SomeFunctionA(char*);
// wchar_t SomeFunctionW(wchar_t*);
// TCHAR SomeFunction(TCHAR*);

The idea is that you use TCHAR to make files
that compile either way. Actually, they also
define CHAR (useless; it is always char) and
something all-uppercase for wchar_t too. They
even do it for LONG, etc., except that LONG
isn't a long on Win64.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uniconv-1.c
Type: text/x-csrc
Size: 6499 bytes
Desc: not available
Url : http://lists.alioth.debian.org/pipermail/debburn-devel/attachments/20061005/862efd51/uniconv-1.c


More information about the Debburn-devel mailing list