Further ideas for Debtags AI

Erich Schubert erich.schubert at gmail.com
Thu Jun 15 15:08:50 UTC 2006


Hi,
> If a single evaluation would take 1 second (seems to take ~3 atm), the
> time required to pre-compute the data (~555 tags, ~10000 packages) would
> take about two months (if my calculations are correct). I don't think

There must be ways to do that much more efficiently. Even with more
complex AIs that we have right now. The training "has" to be the time
consuming part, not the evaluation.

There is a couple of things that could be done; the itemset mining is
one of them. The inference rules obtained by itemset mining could be
used to iteratively evaluate tags for a package, skipping tags that
seem unreasonable given earlier decisions.

Another approach could be to do a support/discrimination analysis of
words first, then filter the text using this data (e.g. by picking the
top 20 words wrt to their support*discrimination value)
Then it should be possible to scale down the anaylsis costs by picking
suitable ranking numbers.

If speed is still such a big issue, will you focus on the AI then
first, before doing much on the database or UI side? (I'd still love
some mockups, though ;-) )

best regards,
Erich Schubert
--
    erich@(mucl.de|debian.org)      --      GPG Key ID: 4B3A135C    (o_
  To understand recursion you first need to understand recursion.   //\
  Wo befreundete Wege zusammenlaufen, da sieht die ganze Welt für   V_/_
        eine Stunde wie eine Heimat aus. --- Herrmann Hesse



More information about the Debtags-devel mailing list