[Dict-common-dev] Re: Bug#321040: fixed in bgoffice 3.0-5

Fri Sep 30 19:17:32 UTC 2005

On Fri, Sep 30, 2005 at 02:47:03PM +0300, Anton Zinoviev wrote:
> On Wed, Sep 28, 2005 at 11:02:38PM +0200, Agustin Martin wrote:
> > 
> > I have tested that in sarge and sid, and seemed to work in sid, but failed
> > in sarge.
> 
> I upgraded my system to latest sid but I cannot see a difference.
> "M-x ispell-buffer" still checks the ASCII Latin letters but skips the
> Cyrillic (and probably the non-ASCII Latin letters also).

Do you have mule-ucs installed? This seems to be the real difference between
the sid and sarge boxes I used. I have just installed mule-ucs in my home
sarge box, and seems to work now (since I do not speak bulgarian, I mean by
work give some mispelled complete cyrillic word). Please confirm if this is
also working for you, if so this deserves a note in dictionaries-common
README.emacs file.

> 
> > My guess is that cyrillic had unification problems as
> > iso-latin had, being the same character internally mule represented
> > diferently depending on the font.
> 
> What has changed for iso-latin?  Is the change somehow connected with
> the Emacs functions unify-8859-on-decoding-mode and
> unify-8859-on-encoding-mode?

For emacs 21.3 and above I added a patch from emacs-cvs, related to
ucs-mule-8859-to-mule-unicode. For xemacs and previous emacs, when a dict is
iso-8859-1 and buffer-file-coding-system is iso-8859-15, ispell.el is fooled
to think dict is iso-8859-15. This in practice means do not reencode on the
fly, and works well unless specific iso-8859-15 chars are used (like french
'oe' together in one char), where misalignents appear. This last also does
not work with utf8 buffers. In the dictionaries-common sources that is

debian/patches/470_ispell.el_fixlatin0-1.dpatch

For these emacs versions, mule-ucs seems to not be required for utf8->latin-{0,1}.

> 
> A copy/paste error.  I had used
> 
> (custom-set-variables
>   (ispell-program-name "aspell")
>   (ispell-dictionary "bulgarian")) ; or "polish"
> 
> > Does
> > 
> > (custom-set-variables
> >   '(ispell-program-name "ispell")      ; or "aspell"
> >   '(ispell-local-dictionary "polish")) ; or "bulgarian"
> 
> There are no problems with ispell.  Only aspell doesn't work properly
> with respect to this.

Make sure that what you set is 'ispell-local-dictionary', not
'ispell-dictionary'. Also, there is a rudimentary support in the
dictionaries-common .el files (debian-ispell.el) to try guessing
the default aspell dictionary after the value of the LANG environment
variable. Does not support things like 'es_ES:en', just with a single
entry, so you might want to try

$ LANG=bg_BG emacs ...

and see if it works without setting ispell-local-dictionary. Again,
I will try next week, with a higher bandwith and a sid box.

> 
> I think the questions will be asked again only when the encoding on
> the console changes.  Here is another solution (untested):
> 
> from_utf8 () {
>     iconv -c -futf-8 -t`locale charmap`
> }
> 
> to_utf8 () {
>     iconv -c -f`locale charmap` -tutf-8 
> }

Note that you will lose non-displayable chars in from_utf8(), and you will
not be able to recover them afterwards in to_utf8().

Also, all the parts using that in dictionaries-common are written in perl,
not in shell script. What I was thinking was a wrapper to the input
function doing essentially (fully untested)

 $choices  = get ($question,"choices");
 $value    = get ($question);
 $charmap  = `locale charmap`;
 $lchoices = `echo $choices | iconv -c -futf-8 -t$charmap`;

 @utf8_choices  = split(', ',$choices);
 @local_choices = split(', ',$lchoices);

Do a direct hash (%direct), and the reverse one (%reverse) with mappings
between both arrays,

direct   utf8  -> local
reverse  local -> utf8

so

 subst ($question,"choices",$lchoices);
 set   ($question,$direct{$value});
 input($question);
 set ($question,$reverse{get($question)});
 subst ($question,"choices",$choices);
 go();

should do the final work.

This also needs some error handling code, so in case of error input function
is called as is currently done, and should also exit early if question is
seen. Something like this should work and provide consistent namings across
the different debconf frontends, and is what I have in mind. The drawback
is that I do not like to play that much with debconf.

Cheers,

-- 
Agustin