[Dict-common-dev] Re: Bug#321040: fixed in bgoffice 3.0-5

Wed Sep 28 21:02:38 UTC 2005

On Tue, Sep 27, 2005 at 04:35:53PM +0300, Anton Zinoviev wrote:
> On Tue, Sep 27, 2005 at 02:20:27PM +0200, Agustin Martin wrote:
> > This should be internally handled by most {x}emacs if
> > buffer-file-coding-system is set to the encoding instead to
> > 'undecided' or equivalent.  Notably xemacs21-nomule does not support
> > that. ispell.el will recode that UTF-8 to the encoding declared by
> > the dictionary when sending strings and the other way back when
> > receiving them. That should be transparent to the user, unless the
> > original UTF-8 has characters that cannot be recoded to the single
> > byte encoding, leading to misalignment errors (like in #205516).
> 
> For me this works only for 8-bit coding systems. :-( For utf-8 encoded
> bufers "M-x ispell-bufer" works only on words that do not contain
> non-Latin1 letters.  The other words (i.e. all for a non-Latin
> language) are simply skipped.  (I can observe this because the
> Bulgarian dictionary for aspell accepts both the Bulgarian and the
> English words - an advantage of Bulgarian being a non-Latin language.)

I have tested that in sarge and sid, and seemed to work in sid, but failed
in sarge. My guess is that cyrillic had unification problems as
iso-latin had, being the same character internally mule represented
diferently depending on the font. I have done the tests with the ibulgarian
README file, in utf-8 or recoded to cp1251. Cyrillic chars show properly in
cp1251, but (after explicitely declaring buffer as utf-8) as squares in
utf-8, when the screen chars should be the same. I am not sure if another
installed package (mule-ucs?) could make a difference.

> 
> There is also another weird problem I'd like to ask for.  I found it
> to be reproducible for all non-ISO-8859-1 dictionaries for aspell, for
> example aspell-pl (Latin2) and aspell-bg (Cyrillic).  I have the
> following setup in my ~/.emacs:
> 
> (custom-set-variables
>   '(ispell-program-name "bulgarian") ; or "polish"
>   '(ispell-dictionary "polish"))

Does

(custom-set-variables
  '(ispell-program-name "ispell")      ; or "aspell"
  '(ispell-local-dictionary "polish")) ; or "bulgarian"

work?

> 
> Then I am loading a file and do "M-x ispell-buffer".  The result is
> 
> Ispell misalignment: word `ZP' point 169; probably incompatible versions
> 
> However if I manually select the Bulgarian (resp. Polish) language by
> "M-x ispell-change-dictionary" there is no problem (that is for 8-bit
> coding systems).  Ispell works fine as a default dictionary, only
> aspell requires manual setting of the dictionary for every buffer.
> 
...
> 
> The package language-env used to cheat Emacs20 that the user works
> with ISO 8859-1 but sets up a CP1251 font.  Thats why there is a
> iso-8859-1 entry for a Cyrillic language.  But you are right - Emacs20
> is not important any more.

Nice to know that,

> 
> > The only problem is (emacs20 discarded)
> > with xemacs21, and seems to be easily fixable defining cp1251 as an alias to
> > windows-1251 for xemacs. I can add that in an initialization file.
> > 
> > I have seen another problem in the ispell entry name. While all utf-8
> > entries I tried displayed as raw chars in my latin1 environment when used
> > in a debconf prompt, showing all chars, the bulgarian entry seems to only
> > show the first char (as a 3 byte UTF-8 char) and nothing of the remaining
> > chars.
> 
> There are only 2 byte UTF-8 chars there but the fourth byte is \212
> and is not part of ISO 8859-1.

I have seen that this does not happen in the readline or gnome frontends,
only in the dialog frontend when whiptail is used, funnily works when the
dialog program is forced for the dialog frontend. So the problem is somewhat
limited.

> > What do you think?
> 
> I think the best solution is  to insert somewhere the command
> 
> iconv -c -futf-8 -t`locale charmap`
> 

Thanks for the suggestion,

That is desirable, but in a first approach not very straighforward
considering the interaction with debconf and how we gather values from
the different dicts/wordlists. I have thought a bit about that, but what I
had in mind requires some debconf black magic, and I am still unsure that
would preserve previous functionality and does not force debconf questions
being prompted again over and over. But I would like to experiment with
this.

In the meantime I have tested whiptail with the real string (with the \212,
and not with it stripped after a copy/paste), and seems that displays
something reasonable for normal text, but fails for menu entries when
non-displayable chars are found.

I will prepare a sample and mail whiptail maintainer, probably with a low
priority bug report.

Cheers,

-- 
Agustin