[Dict-common-dev] Default wordlist selection by locale

Agustin Martin agustin.martin at hispalinux.es
Fri Sep 2 14:47:54 UTC 2005


On Fri, Sep 02, 2005 at 01:41:36PM +0100, Colin Watson wrote:
> Hi,
> 

Hi, Colin

> I've got a hairy problem with installation of dictionaries as part of
> the Ubuntu installation process.
> 
> Part of our policy in the Ubuntu installer is to ask as few questions as
> possible, and in particular to avoid questions in the second stage
> (base-config). In most cases the second stage needs to ask no questions
> at all, although occasionally X can't figure out the screen resolution
> and has to ask for that. However, since we upgraded to the new
> dictionaries-common (currently 0.49.2) and dictionaries as part of the
> aspell 0.60 transition, installations have been asking which dictionary
> should be the default.

Strange, code handling that should be mostly similar in sarge than in 0.49.2

> 
> We install some set of dictionaries as dependencies of the
> language-support-* packages, depending on the selected locale; for
> example, language-support-en depends on wamerican and wbritish. In fact,
> language-support-en is always installed in addition to any other
> locale-specific language support package, so there are always multiple
> dictionaries and a question will always be asked.

That is true if you install the language-support package separately, but
should not happen on first install from scratch (or at the base-config
stage) along with dictionaries-common package, unless there is not even
a fallback match.

> 
> Now, there are various ways I could get around this:
> 
>   * In the short term I'm just going to drop the priority of the
>     wordlist question to medium in Ubuntu; unfortunately, that leaves
>     /etc/dictionaries-common/words (and thus /usr/share/dict/words) as a
>     dangling symlink, which is obviously bad.

See below

> 
>   * I could change dc-debconf-select.pl to select a wordlist arbitrarily
>     in the event that none was explicitly selected. That doesn't produce
>     very good results, though, especially in case wamerican (say)
>     manages to sort before the appropriate wordlist for the primary
>     language the user selected.

idem

> 
>   * We often have better information about the default wordlist than
>     just the language part of the locale; if the user selected en_US,
>     then wamerican should really be the default wordlist, but if they
>     selected en_GB then it should be wbritish. I could have an enormous
>     lookup table in localechooser or something that selects a default
>     wordlist.
> 
>   * Putting this all in localechooser is pretty nasty, though; the set
>     of available wordlist packages could change at any time, and I don't
>     want to have to keep up with it. How about having each wordlist
>     package declare some kind of a priority for various locales (e.g.
>     wamerican could be en_US:10, en_*:5, wbritish could be en_GB:10,
>     en_ZA:9, en_*:5, etc.)? Then something in dictionaries-common could
>     select a good default in case the user didn't explicitly select one,
>     and all the information would reside in individual packages rather
>     than in the installer, which is generally a good plan.
> 
> Does this make any kind of sense to any dictionary maintainers, or am I
> missing something that lets me get good results already?

dictionaries-common.config should already give those results at the base
installation stage.

That is where the pre-seeding is done, after values given by
"debian-installer/language" and "debian-installer/country" debconf values.
If a reasonable value is found it is pre-seeded, and the question priority
is set to low, for control maniacs.

If something is going wrong there, that is the place to fix it
(The installed dictionaries-common.config is really the concatenation
dictionaries-common.config + dc-debconf-select.pl). At the base installation
stage, when only configs are run, but packages are not yet installed that is
the script that will be run (those in the dicts/wordlists will do nothing
because the dc-debconf-select.pl script is not yet installed).

Code there should try guessing the default ispell dictionary/wordlist after
the debian-installer settings, or after the previous symlinks if upgrading
from woody, with different priorities depending on the quality of the
result,

a) Try exact match. If found 
                            -> set debconf value, question priority low
b) Try a reasonable fallback (e.g., en_GB, but no british dict is installed,
   but is an american one)   
                             -> set value with question priority medium
c) Try an english variant    
                             -> set value with question priority medium
d) None of the above         
                             -> ask question with priority critical

Note that while priorities in (b-d) possibilities look high, in practice
they should not result in a debconf question being prompted for any but
very special setups, since most of these will have a single ispell
dictionary/wordlist installed. Also, if values are previously set, nothing
will be changed.

This should only leave the question pending at maximal priority if e.g.,
language is es_ES, and no spanish or english dict is to be installed, but
are german and french, or similar setups.

Removing dictionaries-common along with any other package depending on it,
and reinstalling all them together with something like

# DICT_COMMON_DEBUG="yes" apt-get install language-support-en

should give a lot of information about the guesing process. As a matter of
fact,

# dpkg --purge --force-depends dictionaries-common
# DICT_COMMON_DEBUG="yes" apt-get install dictionaries-common

from an already installed system should also give relevant information.

With that info, we can try guessing what is going wrong, and look for a fix.

Cheers,

-- 
Agustin



More information about the Dict-common-dev mailing list