[Debtags-devel] Using Python NLTK for tag generation [was: AI for tag generation].
Erich Schubert
Erich Schubert <erich.schubert@gmail.com>
Fri, 1 Oct 2004 22:53:21 +0200
Hi,
> - Split the package descriptions (corpora) of already tagged packages in =
tokens;
A regular expression can do that, too, or just a normal tokenizer. No
need for NLTK for this.
> - Associate these tokens with their tags;
Trivial to do using associative hashes and arrays in both C (glib),
C++ (STL) and Perl.
> Why hard work writing grammar rules are needed here?
We don't care for natural language. In fact we are very interested in
the other metadata such as dependencies as well.
>From what i can tell, NLTK has dozens of stuff we don't need or want
to use; the remaining parts are trivial to do yourself. So i see not
much to gain here (except we would need to use python and introduce a
dependency...)
If you look at the URL you posted before, around listing 7 it starts
to go to real natural language processing. I.e. classification of
words into types (nouns, attributes...) then construction of trees
from that using grammar rules.
Gru=DF,
Erich Schubert
--
erich@(mucl.de|debian.org) -- GPG Key ID: 4B3A135C (o_
To understand recursion you first need to understand recursion. //\
Wo befreundete Wege zusammenlaufen, da sieht die ganze Welt f=FCr V_/_
eine Stunde wie eine Heimat aus. --- Herrmann Hesse