[Dict-common-dev] Huge Catalan dictionary

Jordi Mallach jordi at debian.org
Sat Mar 25 16:17:38 UTC 2006


I've spent some hours thinking about how to solve:
#345242: aspell-ca: reports hyphenated and apostrophed words as

It's not trivial. Many verions ago, I asked Agustín about ideas to solve
the big size of my generated dictionaries. He suggested that I could
remove a few rules from my .aff file, and that indeed did generate a
reasonably-sized dictionary. Unfortunately, the stuff that was removed
from the resulting dictionary is quite annoying.

I tried adding some of the rules again, but the dictioary still grows
quite a bit. I've been discussing with my upstream dictionary
maintainer, and he suggests I remove some rules from the aff file and
then hack around the generated wordlist to make things work, although
they suck a bit.

The "100% correct" aspell dictionary is nearly 200 megabytes, as it
includes a lot of variations for hyphenated and apostrophed words, which
is mainly what was getting removed in the past.

My question is: has anyone else faced any similar problem with other
languages? Is a 200 meg dictionary reasonable?  I'd guess Italians and
French have some similar rules, so I'd like to think there are solutions
to this.

Jordi Mallach Pérez  --  Debian developer     http://www.debian.org/
jordi at sindominio.net     jordi at debian.org     http://www.sindominio.net/
GnuPG public key information available at http://oskuro.net/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/dict-common-dev/attachments/20060325/7c2a38a8/attachment.pgp

More information about the Dict-common-dev mailing list