[Debtags-devel] Using Python NLTK for tag generation [was: AI for tag generation].

Gustavo Franco Gustavo Franco <gustavorfranco@gmail.com>
Thu, 30 Sep 2004 17:35:32 -0300


Hi list,

I've joined debtags group at alioth and after doing some tagging i'm
now researching about how we can improve the current system.

Let me say what i've in mind:

- python-debtags : I'm planning to write python bindings for
libdebtags and i guess that it will be impossible to do using pyrex
but i can try with swig or something more complicated.It's needed to
make a better work with nltk (see below) and can be used by other
codes, of course.

- Python Natural Language Toolkit[0] : The bayesian idea sounds great
but we can use a library like ntlk to do the classification easily and
you can see that looking the documentation. Ok, we won't start with a
spam filter and just needing write some hacks here and there but i
think that nltk will be more flexible and useful for us.

By the way, python-debtags when started will be maintained on arch
(revision control).I'll keep it in house for some time and after that
i'll request a repository at arch.d.o.Comments?

I want to start writing python-debtags this weekend, and maybe
packaging python-nltk too to do some tests.I want to hear suggestions
and critics before. :)

[0] = http://nltk.sourceforge.net/

Thanks,
Gustavo Franco -- <stratus@acm.org>