[Dict-common-dev] Re: Bug#278747: dictionaries-common: Please support iso-8859-15

Agustin Martin 278747@bugs.debian.org, Lionel Elie Mamane <lionel@mamane.lu>, dict-common-dev@lists.alioth.debian.org
Thu, 4 Nov 2004 13:39:43 +0100


I mistakenly sent the draft I was ellaborating while looking at this
problem. Rewriting this, since I want to cc this message to the
dict-common-dev mailing list. Sorry for the duplicate.


On Sat, Oct 30, 2004 at 11:26:59PM +0200, Lionel Elie Mamane wrote:
> On Fri, Oct 29, 2004 at 03:55:43PM +0200, Agustin Martin wrote:
> > Note that this does not mean that you cannot spellcheck iso-latin-15
> > documents,
>=20
> (Nitpick: ISO 8859-15 =3D ISO latin 9 or latin 0.)

Seems that I was writing too fast ;-)

>=20
> It does make it impossible to spell-check words with non-ASCII in an
> iso latin 9-encoded file characters in my emacs; the accented
> characters are not considered as word characters. E.g. if I write
> "=E9couter", the spell-checker spell-checks "couter", not the whole
> word. I tried to recompile the ifrench-gut package, changing the
> coding line in the dictionaries-info file to iso-8859-15, and then it
> "just works" (after restarting emacs).
>=20

This seems an emacs problem, If I test spellchecking under emacs in
a utf-8 encoded file using iso-8859-1 dict, everything works fine,
and that is how I tested things after your bug report. I have now
retested things making sure that emacs considers my file as latin0
encoded and >127 chars do not even display and I can reproduce the
above behavior. I did not bother installing the extra fonts package.

>=20
> I think that Emacs considers the "=E9" of latin-1 not to be the same
> character as the "=E9" of latin-9. At least, it gives me huge headaches
> when copy-pasting from a latin-1 file to a latin-9 file: I have to
> retype all accented characters, else it says that the file doesn't
> encode as latin9.
>=20

Yes, this seems the 'emacs unicode suport is not yet complete' problem.
See e.g. bug report #130397,

http://bugs.debian.org/130397

While you are the original reporter for that bug, seems that you were
not cc'ed the further mails (including mines). No fix there and, FYI I
did not receive any further reply from upstream.

I expect that after this problem is addressed upstream the above method
should work, although not for the 'oe' stuff.

In the meantime, replacing latin1 by latin0 in the dictionaries is not
as clean as seems. I have hacked the spanish dict as iso-8859-15 and test=
ed
a file declared as latin1, and found the same error as above, but reverse=
d.
This means that if a dict declares itself as iso-8859-15 to emacs,
spellchecking from latin1 envs will be broken under emacs in the same way=
 as
above plus in this case some 'Ispell misalignment' errors. Worse even, it
gives the same errors when file is utf-8 encoded (and made sure is
considered as that), what did not happen when the dict was declared as
iso-8859-1.

Setting an extra emacs entry with latin0 will help in the short
term, but might add some confusion once the above is fixed. However I am
afraid we have no other way to allow spellchecking of latin0 documents.
Once the emacs problem is fixed, the aditional latin0 entry should replac=
e
the original one with the original name, but with latin0 encoding declare=
d.

Since emacs20 does not have the latin0 encoding, either dicts using latin=
0
or dictionaries-common should probably conflict with emacs20. Not somethi=
ng
I like, but there is probably no other choice. This will not
go to sarge, so there is time for emacs20 to largely disappear from users
systems in the meantime.

> > Since euro symbol is unlikely to be present in a ispell dict, for
> > the ispell dict iso-latin-15 is in practice equivalent to
> > iso-latin-1,
>=20
> Not for French; the euro symbol is not the only difference between
> latin 1 and latin 9. Relevant for French is the replacement of
> I-don't-remember-what by the "o written in the e" (o dans l'e), namely
> ? (if you see it correctly). This character is used e.g. in c?ur
> (heart).
>=20

I see now that finnish also has s-hat and z-hat, present in latin0 but
not in latin1.

I am cc'ing this mesage to the dict-common-dev mailing list, to know what
other developers involved in spellchecking think about this.

Thanks for your feedback

--=20
Agustin