Further ideas for Debtags AI
Peter Rockai
me at mornfall.net
Tue Jun 13 20:22:08 UTC 2006
Hi, just a quick idea i got reading the mail...
On Tue, Jun 13, 2006 at 09:20:04PM +0200, Erich Schubert wrote:
> Hi Alex,
> Independent of your schedule and such (btw, one of my exams was pushed
> back for a week, so I'll be busy for one more week), some ideas I'd
> love to see you research, too:
> - using Itemset mining for Debtags (might be useful in the tagger, too)
> - naive-bayes of 2nd order (i.e. see if it can improve the results
> when you don't do a plain naive bayes, but the "just as much naive"
> approach of also taking "has word A and word B" into the equations.
Please, plase, look at crm114 -- it is a collection of statistical text
discrimination algorithms. I haven't got much idea how useful it will be for
tags, but it is definitely very accurate for spam :-). But it is a toolkit
really, not a spamfilter. So it is hopefully useful. Sorry if someone already
floated that idea, my memory is short and unreliable...
> - k-mode based clustering of packages
> - outlier detection (for detecting badly tagged packages)
> - other datamining algorithms ;-)
>
> best regards,
> Erich Schubert
Yours, Peter.
--
Peter Rockai | me()mornfall!net | prockai()redhat!com | +421907533216
http://blog.mornfall.net | http://web.mornfall.net
"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
-- Blair P. Houghton on the subject of C program indentation
More information about the Debtags-devel
mailing list