[Dict-common-dev] Aspell hash autobuilding

Agustin Martin agustin.martin@hispalinux.es
Thu, 7 Jul 2005 16:00:49 +0200


On Wed, Jul 06, 2005 at 04:05:22AM +0300, K?stutis Bili?nas wrote:
> Hi,
> 
> I also have made an attempt to do so. See:
> http://kebil.ghost.lt/debian/pool/ispell-lt/
> 
> There the myspell-lt package is used as the data package for
> autobuilding the aspell-lt and ilithuanian packeges. So now
> the aspell-lt and ilithuanian dependa on the
> myspell-lt (= ${Source-Version}) package.
> 
> Rezult:
> 
> .deb         | until   | now
> -------------|---------|-------- 
> myspell-lt   |  308KB  | 308KB
> ilithuanian  |  313KB  | 24.5KB
> aspell-lt    |  707KB  | 3.2KB
>  +(*12arch.) | 8484KB  |   0KB
> -------------|---------|--------
>        All:  | 9812KB  | 335.7KB
> 
> How about such the method?

Hi,

After a first look I see it an advantage for size in repositories, but I
think it will need more size in users site. This will force aspell-lt only
or ilithuanian-only users to have also myspell-lt installed. Since the
myspell wordlist is uncompressed that can be unnecesarily large.

The other possibility is having repeated wordlists in each package, rather
non-optimal, but would give approx 300KB for each package (that would be
better for aspell), so ~900KB for all. This however will be nicer to users
having only ilithuanian or aspell-lt installed.

The other drawback I see in your system is that it complicates postinsts,
although some things might be improved there. Since both systems have
benefits and problems, I do not have a clear position about this. Seems to
me that the best is try both. Depending on users reaction they can coexist
in different packages or not. Also note that some languages use different
wordlists for ispell and aspell, so this is of no global application.

Some things in postinsts could be done differently,

> +++ ispell-lt-1.1+20050601/debian/ilithuanian.postinst

> sed '/[0-9][0-9]*/d' /usr/share/myspell/dicts/lt_LT.dic > \
>     /usr/share/ispell/lietuviu.mwl

I do not know the sed internals, but sed '1d' is probably faster, discards
first line and passes the rest without further parsing.

> +++ ispell-lt-1.1+20050601/debian/aspell-lt.postinst

> +	cat /usr/share/myspell/dicts/lt_LT.aff > \
>          /usr/lib/aspell-0.60/lt_affix.dat

Why not a symlink?, that would reduce a bit more aspell-lt size.

> sed '/[0-9][0-9]*/d' /usr/share/myspell/dicts/lt_LT.dic > \
>          /usr/lib/aspell-0.60/lt.wl

Same as above with '1d'

> +	cat /usr/lib/aspell-0.60/lt.wl | LC_COLLATE=C sort -u | \
>         /usr/bin/prezip-bin -z > /usr/lib/aspell-0.60/lt.cwl
> +	/usr/bin/prezip-bin -d < /usr/lib/aspell-0.60/lt.cwl | \
>         /usr/bin/aspell  --lang=lt create master /usr/lib/aspell-0.60/lt.rws
> +	rm -f /usr/lib/aspell-0.60/lt.wl
> +	rm -f /usr/lib/aspell-0.60/lt.cwl

I think that if the myspell wordlist is badly sorted, that will be for both
myspell and aspell, if you use affix compression for both. If so, better sort
it for myspell. Also, I am not sure how is the ispell sort related to these.

Also I do not think that using here prezip is an advantage, may be just

$ cat /usr/lib/aspell-0.60/lt.wl | \
  /usr/bin/aspell  --lang=lt create master /usr/lib/aspell/lt.rws

or even directly

$ cat /usr/share/myspell/dicts/lt_LT.aff | sed '1d' | \
   /usr/bin/aspell  --lang=lt create master /usr/lib/aspell/lt.rws

Cheers,

-- 
Agustin