[gopher] CAPS capability: ServerDefaultCharset

Kim Holviala kim at holviala.com
Sat Jan 3 17:58:44 UTC 2015


> On 03 Jan 2015, at 19:12, James Mills <prologic at shortcircuit.net.au> wrote:
> Look at the function strniconv() on string.c line 144. Not the easiest to understand because it has lots of bit handling - you should probably read & fully understand some UTF-8 docs before trying to figure out what my code does and why it works the way it does.
> 
> Interesting :) Got a lnk to some docs/refs that explain how this works? (I assume you followed some well understood algo)

Nope, there is no algo here, just my own code based on my reading of the offical docs.

Basic idea:

- first 128 bytes of US-ASCII & Latin-1 & UTF-8 are the same
- first 256 bytes of Latin-1 & UTF-8 are the same
- the way UTF-8 encodes >0x80 values is easy to detect from byte stream

Based on that I made a function which:

- first upconverts all three to 32-bit UTF
- if output = US-ASCII use the translation table defined in gophernicus.h to downconvert 0x80 - 0xff to 7-bit
- if output = Latin-1 let 0x00-0xff pass through and discard anything > 0xff
- if output = UTF-8 just encode anything over 0x80 using the UTF-8 encoding



- Kim







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/gopher-project/attachments/20150103/fbe32e95/attachment-0001.html>


More information about the Gopher-Project mailing list