Further ideas for Debtags AI

Peter Rockai me at mornfall.net
Tue Jun 13 20:22:08 UTC 2006


Hi, just a quick idea i got reading the mail...

On Tue, Jun 13, 2006 at 09:20:04PM +0200, Erich Schubert wrote:
> Hi Alex,
> Independent of your schedule and such (btw, one of my exams was pushed
> back for a week, so I'll be busy for one more week), some ideas I'd
> love to see you research, too:
> - using Itemset mining for Debtags (might be useful in the tagger, too)
> - naive-bayes of 2nd order (i.e. see if it can improve the results
> when you don't do a plain naive bayes, but the "just as much naive"
> approach of also taking "has word A and word B" into the equations.

Please, plase, look at crm114 -- it is a collection of statistical text
discrimination algorithms. I haven't got much idea how useful it will be for
tags, but it is definitely very accurate for spam :-). But it is a toolkit
really, not a spamfilter. So it is hopefully useful. Sorry if someone already
floated that idea, my memory is short and unreliable...

> - k-mode based clustering of packages
> - outlier detection (for detecting badly tagged packages)
> - other datamining algorithms ;-)
> 
> best regards,
> Erich Schubert

Yours, Peter.

-- 
Peter Rockai | me()mornfall!net | prockai()redhat!com | +421907533216 
   http://blog.mornfall.net | http://web.mornfall.net

"In My Egotistical Opinion, most people's C programs should be
 indented six feet downward and covered with dirt."
     -- Blair P. Houghton on the subject of C program indentation



More information about the Debtags-devel mailing list