Lior Kaplan kaplan at debian.org
Tue Sep 26 19:31:47 UTC 2006


I'm forwarding this discussion to the list. The main issue is the future
of aspell-ar in Debian.

See summery at the end of the mail.

>>>>>> 2. What do you think about maintaining aspell-ar in this group?
>>>>> Sure I don't. I'd just love to get it out of testing because the wordlist isn't that good.
>>>>> I've prepared a -2 package but wasn't uploaded yet.
>>>> I didn't understand your answer "Sure I don't". What there a mistake here?
>>> Oops, "Sure I don't mind". It'd be better I guess.
>>>> Anyway, the main idea of this group is to:
>>>> a. Have all the in one central place, with many eyes checking each package.
>>>> b. Have the Debian developers in the group automatically sponsor the
>>>> packages for those who aren't DD.
>>>> As we still have a month till the Etch package freeze, I'd prefer to do
>>>> the efforts to prepare the -2 package and have it included in Etch. This
>>>> means the we should upload till mid October.
>>> -2 contains some small fixes for the packaging part but not for the wordlist itself.
>>> The wordlist is bad. I've built it from some Arabic sources I found but it's
>>> not that good. That's why I was thinking about not shipping it with etch.
>>> I guess we can continue this discussion on the list to get all the members in the loop ?
>> Sure. Feel free to write to the list.
>> Did you check the wordlist at
>> ftp://ftp.gnu.org/gnu/aspell/dict/ar/aspell6-ar-1.2-0.tar.bz2
>> Copyright 2006 Google Inc.
>>                Ethan Bradford <ethanb at google com>
>>                Gokalp Yapici <gokalpy at google com>
>>   The original word list used for this package was generated
>> using The Buckwalter Arabic Morphological Analyzer Version 1.0.
>> Maybe we should package it instead of your wordlists ?
> I've talked to them before. The problem with that wordlist is that it's based on the
> buckwalter data. IT contains ancient words and words from the Quran.
> The ancient words are not bad but they are not commonly used. They increase the size if
> the wordlist for the benefit of maybe less than 10% of the users.
> When the list was converted to myspell, OpenOffice ate 90-200MB RAM (Can't really
> remember).
> The Quranic words are only used in the Quran and it's considered an error by the modern
> language.
> All of the above is considered a feature by Ethan and Gokalp.
> Gokalp accounts for the word correctness by using the google frequency count which
> is not a good thing IMHO because a lot of Arabic websites don't spell correctly (When
> it comes to hamza and such things)
> Ethan believes that Tim Buckwalter's opinion is more important than a native speaker's
> opinion.
> I don't know whether the list is 100% error free or not. I can't check it. It's huge
> and I don't really know how to add new words to it.
> That's why I'm objecting to the wordlist.
> There are some people working on a wordlist for hunspell. I didn't have enough time to
> check with them but seems that they are doing a good job. I'm sure the hunspell list can be
> ported to aspell somehow.
> Here's the URL (It's in Arabic) http://perso.menara.ma/~kebdani/ayaspell-dic/
> What do you think ?

Summery of the discussion:
1. Mohammed Sameer is the maintainer of aspell-ar in Debian.
2. The package needs a -2 version for small fixes (easy part).
3. Mohammed think the word lists (taken from
http://foolab.org/projects/arspell) for this package aren't good enough.
4. He thinks we shouldn't provide aspell-ar in Etch.
5. In the Aspell's ftp (ftp://ftp.gnu.org/gnu/aspell/dict/ar/) there's a
 very large word list made by Google. Mohammed thinks they aren't
updated enough and contain too old words.
6. There's a third project that makes word lists
(http://perso.menara.ma/~kebdani/ayaspell-dic/). As far as I saw from
the site, the word lists aren't ready yet.
7. Debian will have a package freeze by the end of October. This means
we have 3 weeks to prepare the aspell-ar package and upload, in order to
have it included in Etch.

Issues for decision:
1. Should we ship an aspell-ar package in Etch.
2. If yes to #1, which word lists should we use.
3. There's an option to package both.


Lior Kaplan
kaplan at debian.org

GPG fingerprint:
C644 D0B3 92F4 8FE4 4662  B541 1558 9445 99E8 1DA0

