[Dict-common-dev] UTF-8 and ispell

Agustin Martin agustin.martin at hispalinux.es
Sat Sep 29 18:43:46 UTC 2007


On Sat, Sep 29, 2007 at 11:03:02AM +0200, Rafael Laboissiere wrote:

> I think that the maintainer of aspell-bg should provide a coherent
> info-aspell file, I mean, if "Coding-System: cp1251" is declared in this
> file, then all the *chars fields of the corresponding entry should be in
> that encoding.  Should I file a bug report against aspell-bg?
> 
> At any rate, the strings in jed-ispell-dicts.sl are too long for aspell-bg
> and ispell_init.sl fails here with the error message:
> 
> /var/cache/dictionaries-common/jed-ispell-dicts.sl:232: String too long for buffer: found '??'

We could try filtering octal codes with something like

$additionalchars =~ s/\\([0-3][0-7][0-7])/chr(oct($1))/ge;

before the conversion to utf-8. I expect this to just output the string of
single byte chars (I hope this is not messed up in some locale settings) and
leave 'as is' anything not an octal code.

If working properly, this should serve all purposes. Editing an info file with
different encodings is a mess, that is why octal codes for the required encoding
are useful.

--
Agustin



More information about the Dict-common-dev mailing list