[Dict-common-dev] Re: Bug#321040: fixed in bgoffice 3.0-5

Tue Sep 27 13:35:53 UTC 2005

[Adding dict-common-dev at lists.alioth.debian.org to the list of recipients.]

On Tue, Sep 27, 2005 at 02:20:27PM +0200, Agustin Martin wrote:
> On Wed, Sep 21, 2005 at 04:32:06PM -0700, Anton Zinoviev wrote:
> 
> > Changes: 
> >  bgoffice (3.0-5) unstable; urgency=low
> ...
> >    * Files /etc/emacs21/site-start.d/90{aspell-bg,ibulgarian}.el to
> >      codepage-setup cp1251.  It is still not clear to me how to support
> >      spelling of Bulgarian UTF-8 texts in Emacs.
> 
> This should be internally handled by most {x}emacs if
> buffer-file-coding-system is set to the encoding instead to
> 'undecided' or equivalent.  Notably xemacs21-nomule does not support
> that. ispell.el will recode that UTF-8 to the encoding declared by
> the dictionary when sending strings and the other way back when
> receiving them. That should be transparent to the user, unless the
> original UTF-8 has characters that cannot be recoded to the single
> byte encoding, leading to misalignment errors (like in #205516).

For me this works only for 8-bit coding systems. :-( For utf-8 encoded
bufers "M-x ispell-bufer" works only on words that do not contain
non-Latin1 letters.  The other words (i.e. all for a non-Latin
language) are simply skipped.  (I can observe this because the
Bulgarian dictionary for aspell accepts both the Bulgarian and the
English words - an advantage of Bulgarian being a non-Latin language.)

There is also another weird problem I'd like to ask for.  I found it
to be reproducible for all non-ISO-8859-1 dictionaries for aspell, for
example aspell-pl (Latin2) and aspell-bg (Cyrillic).  I have the
following setup in my ~/.emacs:

(custom-set-variables
  '(ispell-program-name "bulgarian") ; or "polish"
  '(ispell-dictionary "polish"))

Then I am loading a file and do "M-x ispell-buffer".  The result is

Ispell misalignment: word `ZP' point 169; probably incompatible versions

However if I manually select the Bulgarian (resp. Polish) language by
"M-x ispell-change-dictionary" there is no problem (that is for 8-bit
coding systems).  Ispell works fine as a default dictionary, only
aspell requires manual setting of the dictionary for every buffer.

I have not set up a language environment for Emacs.  I work in an
UTF-8 locale and when I want to open a non-UTF-8 document I use "C-x
RET c coding_system C-x C-f".

> >    * Add entries for different Emacs versions in ibulgarian.info-ispell and
> >      aspell-bg.info-aspell.  Thanks to Ivan Raikov, closes: #321040.
> 
> Seems that xemacs21 also does not support cp1251. The summary seems to be
> 
> emacs20: nothing
> emacs21: cp1251
> emacs22: cp1251, windows-1251
> xemacs21: windows-1251
> 
> I would forget emacs20, that was not even shipped with sarge (and whose
> iso-8859-1 entry was wrong), and concentrate in leaving only the cp1251
> entry, that also matches aspell.

The package language-env used to cheat Emacs20 that the user works
with ISO 8859-1 but sets up a CP1251 font.  Thats why there is a
iso-8859-1 entry for a Cyrillic language.  But you are right - Emacs20
is not important any more.

> The only problem is (emacs20 discarded)
> with xemacs21, and seems to be easily fixable defining cp1251 as an alias to
> windows-1251 for xemacs. I can add that in an initialization file.
> 
> I have seen another problem in the ispell entry name. While all utf-8
> entries I tried displayed as raw chars in my latin1 environment when used
> in a debconf prompt, showing all chars, the bulgarian entry seems to only
> show the first char (as a 3 byte UTF-8 char) and nothing of the remaining
> chars.

There are only 2 byte UTF-8 chars there but the fourth byte is \212
and is not part of ISO 8859-1.

> I do not have a clear position regarding this last, when the use of utf8
> was introduced in policy seemed that all utf8 chars were to be displayed as
> multibyte chains in single byte encodings, leaving in the worst case the
> english translation readable. But this case confuses me, we should probably
> suggest trying first some sort of 7bit 'native' transliteration when possible
> instead of directly suggesting the use of UTF8, or at least using something
> like
> 
> 7bit_transliteration [UTF-8_native_name] (english translation)
> 
> when utf8 is used. I hope that would at least make the 7bit_transliteration
> readable in the worst case, when something in the utf8 string confuses
> whiptail (but I did not check that). This seems to not affect readline or
> gnome frontends. Another possibility would be to leave things as they
> currently are, expecting utf8 support be improved in the meantime.
> 
> What do you think?

I think the best solution is  to insert somewhere the command

iconv -c -futf-8 -t`locale charmap`

Anton Zinoviev