UTF-8 and ispell

G. Milde milde at users.sourceforge.net
Fri Sep 21 16:11:49 UTC 2007

On 21.09.07, Rafael Laboissiere wrote:
> * G. Milde <milde at users.sourceforge.net> [2007-09-20 11:31]:

> Actually, my mental model of how the whole thing works was wrong.  The
> jed-ispell-dicts.sl is automatically generated by dictionaries-common at
> installation time for package i<language> from the information provided in
> file debian/i<language>.info-ispell also in
> /var/lib/dictionaries-common/ispell/i<language>).

> If a new record is created in this file containing, as you suggested:

>     Language: deutsch (New German 8 bit UTF-8)
>     Hash-Name: ngerman
>     Emacsen-Name: german-new8-utf8
>     Casechars: [A-Za-zÄÖÜäößü]
>     Not-Casechars: [^A-Za-zÄÖÜäößü]
>     Otherchars: [']
>     Many-Otherchars: no
>     Additionalchars: ÄÖÜäößü
>     Ispell-Args: -C -d ngerman
>     Extended-Character-Mode: ~utf8
>     Coding-System: utf-8
>     Locale: de_DE

> then the following would appear in jed-ispell-dicts.sl:

>     ispell_add_dictionary (
>       "german-new8-utf8",
>       "ngerman",
>       "ÄÖÜäößü",
>       "[']",
>       "~utf8",
>       "-C -d ngerman");

> So, my conclusion is that it is not jed-extra's neither
> dictionnaries-common's responsibility to provided utf-8 support for
> ispell.sl but rather it is up to the individual i<language> package to
> provide it through the debian/i<language>.info-ispell files. (I will
> consider filling bug reports against the ispell dictionary packages.)

Yes indeed this should be solved on the ispell dictionary package levels.

> The only donwside of this approach is that users will be provided with both
> choices "<language>" and "<language>-utf8" when calling
> ispell_change_dictionary although only one of them will make ispell.sl work
> correctly according to the character encoding system used.

> It would be good if non-UTF8 possibilities could be filtered out when 
> _slang_utf8_ok, probably by looking at the extchr argument passed to
> ispell_add_dictionary().  [Paul: what do you think?]

A simple method would be to put the ispell_add_dictionary() in a try
clause. An invalid string arg would then result in the skipping of the
"guilty" dictionary.

You would still have <language>-utf8 with non-utf8 Jed, but this should
still work and maybe even useful in some cases.


More information about the Pkg-jed-devel mailing list