[Debian-arabic-packages] The future of aspell-ar package
kaplan at debian.org
Tue Sep 26 19:31:47 UTC 2006
I'm forwarding this discussion to the list. The main issue is the future
of aspell-ar in Debian.
See summery at the end of the mail.
>>>>>> 2. What do you think about maintaining aspell-ar in this group?
>>>>> Sure I don't. I'd just love to get it out of testing because the wordlist isn't that good.
>>>>> I've prepared a -2 package but wasn't uploaded yet.
>>>> I didn't understand your answer "Sure I don't". What there a mistake here?
>>> Oops, "Sure I don't mind". It'd be better I guess.
>>>> Anyway, the main idea of this group is to:
>>>> a. Have all the in one central place, with many eyes checking each package.
>>>> b. Have the Debian developers in the group automatically sponsor the
>>>> packages for those who aren't DD.
>>>> As we still have a month till the Etch package freeze, I'd prefer to do
>>>> the efforts to prepare the -2 package and have it included in Etch. This
>>>> means the we should upload till mid October.
>>> -2 contains some small fixes for the packaging part but not for the wordlist itself.
>>> The wordlist is bad. I've built it from some Arabic sources I found but it's
>>> not that good. That's why I was thinking about not shipping it with etch.
>>> I guess we can continue this discussion on the list to get all the members in the loop ?
>> Sure. Feel free to write to the list.
>> Did you check the wordlist at
>> Copyright 2006 Google Inc.
>> Ethan Bradford <ethanb at google com>
>> Gokalp Yapici <gokalpy at google com>
>> The original word list used for this package was generated
>> using The Buckwalter Arabic Morphological Analyzer Version 1.0.
>> Maybe we should package it instead of your wordlists ?
> I've talked to them before. The problem with that wordlist is that it's based on the
> buckwalter data. IT contains ancient words and words from the Quran.
> The ancient words are not bad but they are not commonly used. They increase the size if
> the wordlist for the benefit of maybe less than 10% of the users.
> When the list was converted to myspell, OpenOffice ate 90-200MB RAM (Can't really
> The Quranic words are only used in the Quran and it's considered an error by the modern
> All of the above is considered a feature by Ethan and Gokalp.
> Gokalp accounts for the word correctness by using the google frequency count which
> is not a good thing IMHO because a lot of Arabic websites don't spell correctly (When
> it comes to hamza and such things)
> Ethan believes that Tim Buckwalter's opinion is more important than a native speaker's
> I don't know whether the list is 100% error free or not. I can't check it. It's huge
> and I don't really know how to add new words to it.
> That's why I'm objecting to the wordlist.
> There are some people working on a wordlist for hunspell. I didn't have enough time to
> check with them but seems that they are doing a good job. I'm sure the hunspell list can be
> ported to aspell somehow.
> Here's the URL (It's in Arabic) http://perso.menara.ma/~kebdani/ayaspell-dic/
> What do you think ?
Summery of the discussion:
1. Mohammed Sameer is the maintainer of aspell-ar in Debian.
2. The package needs a -2 version for small fixes (easy part).
3. Mohammed think the word lists (taken from
http://foolab.org/projects/arspell) for this package aren't good enough.
4. He thinks we shouldn't provide aspell-ar in Etch.
5. In the Aspell's ftp (ftp://ftp.gnu.org/gnu/aspell/dict/ar/) there's a
very large word list made by Google. Mohammed thinks they aren't
updated enough and contain too old words.
6. There's a third project that makes word lists
(http://perso.menara.ma/~kebdani/ayaspell-dic/). As far as I saw from
the site, the word lists aren't ready yet.
7. Debian will have a package freeze by the end of October. This means
we have 3 weeks to prepare the aspell-ar package and upload, in order to
have it included in Etch.
Issues for decision:
1. Should we ship an aspell-ar package in Etch.
2. If yes to #1, which word lists should we use.
3. There's an option to package both.
kaplan at debian.org
C644 D0B3 92F4 8FE4 4662 B541 1558 9445 99E8 1DA0
More information about the Debian-arabic-packages