[Dict-common-dev] Re: Bug#321040: fixed in bgoffice 3.0-5
Anton Zinoviev
anton at lml.bas.bg
Tue Sep 27 13:35:53 UTC 2005
[Adding dict-common-dev at lists.alioth.debian.org to the list of recipients.]
On Tue, Sep 27, 2005 at 02:20:27PM +0200, Agustin Martin wrote:
> On Wed, Sep 21, 2005 at 04:32:06PM -0700, Anton Zinoviev wrote:
>
> > Changes:
> > bgoffice (3.0-5) unstable; urgency=low
> ...
> > * Files /etc/emacs21/site-start.d/90{aspell-bg,ibulgarian}.el to
> > codepage-setup cp1251. It is still not clear to me how to support
> > spelling of Bulgarian UTF-8 texts in Emacs.
>
> This should be internally handled by most {x}emacs if
> buffer-file-coding-system is set to the encoding instead to
> 'undecided' or equivalent. Notably xemacs21-nomule does not support
> that. ispell.el will recode that UTF-8 to the encoding declared by
> the dictionary when sending strings and the other way back when
> receiving them. That should be transparent to the user, unless the
> original UTF-8 has characters that cannot be recoded to the single
> byte encoding, leading to misalignment errors (like in #205516).
For me this works only for 8-bit coding systems. :-( For utf-8 encoded
bufers "M-x ispell-bufer" works only on words that do not contain
non-Latin1 letters. The other words (i.e. all for a non-Latin
language) are simply skipped. (I can observe this because the
Bulgarian dictionary for aspell accepts both the Bulgarian and the
English words - an advantage of Bulgarian being a non-Latin language.)
There is also another weird problem I'd like to ask for. I found it
to be reproducible for all non-ISO-8859-1 dictionaries for aspell, for
example aspell-pl (Latin2) and aspell-bg (Cyrillic). I have the
following setup in my ~/.emacs:
(custom-set-variables
'(ispell-program-name "bulgarian") ; or "polish"
'(ispell-dictionary "polish"))
Then I am loading a file and do "M-x ispell-buffer". The result is
Ispell misalignment: word `ZP' point 169; probably incompatible versions
However if I manually select the Bulgarian (resp. Polish) language by
"M-x ispell-change-dictionary" there is no problem (that is for 8-bit
coding systems). Ispell works fine as a default dictionary, only
aspell requires manual setting of the dictionary for every buffer.
I have not set up a language environment for Emacs. I work in an
UTF-8 locale and when I want to open a non-UTF-8 document I use "C-x
RET c coding_system C-x C-f".
> > * Add entries for different Emacs versions in ibulgarian.info-ispell and
> > aspell-bg.info-aspell. Thanks to Ivan Raikov, closes: #321040.
>
> Seems that xemacs21 also does not support cp1251. The summary seems to be
>
> emacs20: nothing
> emacs21: cp1251
> emacs22: cp1251, windows-1251
> xemacs21: windows-1251
>
> I would forget emacs20, that was not even shipped with sarge (and whose
> iso-8859-1 entry was wrong), and concentrate in leaving only the cp1251
> entry, that also matches aspell.
The package language-env used to cheat Emacs20 that the user works
with ISO 8859-1 but sets up a CP1251 font. Thats why there is a
iso-8859-1 entry for a Cyrillic language. But you are right - Emacs20
is not important any more.
> The only problem is (emacs20 discarded)
> with xemacs21, and seems to be easily fixable defining cp1251 as an alias to
> windows-1251 for xemacs. I can add that in an initialization file.
>
> I have seen another problem in the ispell entry name. While all utf-8
> entries I tried displayed as raw chars in my latin1 environment when used
> in a debconf prompt, showing all chars, the bulgarian entry seems to only
> show the first char (as a 3 byte UTF-8 char) and nothing of the remaining
> chars.
There are only 2 byte UTF-8 chars there but the fourth byte is \212
and is not part of ISO 8859-1.
> I do not have a clear position regarding this last, when the use of utf8
> was introduced in policy seemed that all utf8 chars were to be displayed as
> multibyte chains in single byte encodings, leaving in the worst case the
> english translation readable. But this case confuses me, we should probably
> suggest trying first some sort of 7bit 'native' transliteration when possible
> instead of directly suggesting the use of UTF8, or at least using something
> like
>
> 7bit_transliteration [UTF-8_native_name] (english translation)
>
> when utf8 is used. I hope that would at least make the 7bit_transliteration
> readable in the worst case, when something in the utf8 string confuses
> whiptail (but I did not check that). This seems to not affect readline or
> gnome frontends. Another possibility would be to leave things as they
> currently are, expecting utf8 support be improved in the meantime.
>
> What do you think?
I think the best solution is to insert somewhere the command
iconv -c -futf-8 -t`locale charmap`
Anton Zinoviev
More information about the Dict-common-dev
mailing list