[Debtags-devel] Using Python NLTK for tag generation [was: AI for tag generation].

Gustavo Franco Gustavo Franco <gustavorfranco@gmail.com>
Fri, 1 Oct 2004 14:02:32 -0300


Hi,

I guess that NLTK is more useful than it but i prefer to code
something and see if the results fit with our needs.Maybe you know
more about NLTK than me, do you? If not, please check this url:
http://www-106.ibm.com/developerworks/linux/library/l-cpnltk.html.

The NTLK can be used to:
- Split the package descriptions (corpora) of already tagged packages in tokens;
- Associate these tokens with their tags;
- Get some package descriptions of non-tagged packages and try
classify them using the text classification stuff already on NLTK.

Why hard work writing grammar rules are needed here?

I agree with a special tag too, but can't it be 'special:tag-verified' ?

Thanks,
Gustavo Franco -- <stratus@acm.org>

On Fri, 1 Oct 2004 17:49:01 +0200, Erich Schubert
<erich.schubert@gmail.com> wrote:
> Hi,
> NTLK and similar natural-language-processing things are mostly of use
> when you have a limited grammar you want to understand perfectly. i.e.
> parsing commands like "search for all movies directed by stephen
> spielberg"
> This requires a lot of work in writing grammar rules and such; but we
> can't actually use this grammar information. Therefore i don't think
> NTLK will help that much.
> 
> The suggestion by enrico of a "special:completely-tagged" tag is
> sweet. I'd appreciate having this tag added to the vocabulary.
>