UTF-8 and ispell
Rafael Laboissiere
rafael at debian.org
Fri Sep 21 07:20:17 UTC 2007
[Cc'ing to Paul. Paul: I know you are subscribed to this ML, but I wanted
to be sure you will see my question at the end of this post.]
* G. Milde <milde at users.sourceforge.net> [2007-09-20 11:31]:
> It should look for utf8 in the aff files an add a line like::
>
> deutsch (Old German UTF-8)
>
> to ispell-dicts-list.txt for every dictionary providing 'altstringtype "utf8"'
>
> jed-ispell-dicts.sl should then contain something like ::
>
> ispell_add_dictionary (
> "german-old-tex",
> "ogerman",
> "\"",
> "[']",
> "~tex",
> "-C -d ogerman");
>
> if (_slang_utf8_ok) {
> ispell_add_dictionary (
> "german-old-utf8",
> "ogerman",
> "ÃÃÃäöÃü",
> "[']",
> "~utf8",
> "-C -d ogerman");
> } else {
> ispell_add_dictionary (
> "german-old8",
> "ogerman",
> "ÄÖÜäößü",
> "[']",
> "~latin1",
> "-C -d ogerman");
> }
>
> so that the correct argument is passed to ispell.
>
> This works now in both, UTF8 and latin1 enabled jed.
>
> (I did not check how this could be done and how it fits in the
> dictionaries-common policy.)
Actually, my mental model of how the whole thing works was wrong. The
jed-ispell-dicts.sl is automatically generated by dictionaries-common at
installation time for package i<language> from the information provided in
file debian/i<language>.info-ispell also in
/var/lib/dictionaries-common/ispell/i<language>). In the ingerman package,
this file contains:
Language: deutsch (New German -tex mode-)
Hash-Name: ngerman
Emacsen-Name: german-new
Casechars: [A-Za-z\"]
Not-Casechars: [^A-Za-z\"]
Otherchars: [']
Many-Otherchars: no
Additionalchars: \"
Ispell-Args: -C -d ngerman
Extended-Character-Mode: ~tex
Coding-System: iso-8859-1
Locale: de_DE
Language: deutsch (New German 8 bit)
Hash-Name: ngerman
Emacsen-Name: german-new8
Casechars: [A-Za-z������
Not-Casechars: [^A-Za-z������
Otherchars: [']
Many-Otherchars: no
Additionalchars: ����
Ispell-Args: -C -d ngerman
Extended-Character-Mode: ~latin1
Coding-System: iso-8859-1
Locale: de_DE
If a new record is created in this file containing, as you suggested:
Language: deutsch (New German 8 bit UTF-8)
Hash-Name: ngerman
Emacsen-Name: german-new8-utf8
Casechars: [A-Za-zÄÖÜäößü]
Not-Casechars: [^A-Za-zÄÖÜäößü]
Otherchars: [']
Many-Otherchars: no
Additionalchars: ÄÖÜäößü
Ispell-Args: -C -d ngerman
Extended-Character-Mode: ~utf8
Coding-System: utf-8
Locale: de_DE
then the following would appear in jed-ispell-dicts.sl:
ispell_add_dictionary (
"german-new8-utf8",
"ngerman",
"ÄÖÜäößü",
"[']",
"~utf8",
"-C -d ngerman");
So, my conclusion is that it is not jed-extra's neither
dictionnaries-common's responsibility to provided utf-8 support for
ispell.sl but rather it is up to the individual i<language> package to
provide it through the debian/i<language>.info-ispell files. (I will
consider filling bug reports against the ispell dictionary packages.)
The only donwside of this approach is that users will be provided with both
choices "<language>" and "<language>-utf8" when calling
ispell_change_dictionary although only one of them will make ispell.sl work
correctly according to the character encoding system used.
It would be good if non-UTF8 possibilities could be filtered out when
_slang_utf8_ok, probably by looking at the extchr argument passed to
ispell_add_dictionary(). [Paul: what do you think?]
--
Rafael
More information about the Pkg-jed-devel
mailing list