[Debtags-devel] AI Tagger

Hanna M. Wallach hmw26 at cam.ac.uk
Sun Aug 14 20:23:47 UTC 2005


Hi Benjamin,

> It think the first that must be discussed is the purpose of the AI
> tagger. The first question is: do we need it at all. And the second: If
> so, what for.

I work on machine learning (specifically Bayesian techniques) and had
a brief chat with Enrico about the AI tagger for Debtags at
Debconf5. I think the idea of using machine learning techniques in
this way is very sensible, and I'm very keen to think about other ways
in which machine learning can be used for Debtags and whether there
are other techniques that might be well suited to this application.

> My answer for the first question is: we could give it a try, and see if
> it is useful. The use case I have in mind for the tagger is, offering a
> frontend where the maintainers can enter their packages and get a
> suggested set of tags for each of them.

Sounds sensible. How well does the tagger correspond with human
judgement? Have you done any evaluation of whether the tags proposed
are the same/similar/better than those proposed by a human?

Let me propose another way in which machine learning could be used, in
addition to the task your tagger is designed to solve: suggesting
packages that may be potentially of interest to a user given some
subset of the packages they already have installed. Essentially this
is a clustering task and could make use of tags as input data.

> Now, a word towards how the tagger works
[...]

I'd love to have a more detailed conversation about the machine
learning details of your tagger -- specifically, exactly what
technique are you using? From the Debtags list archives (thanks for
forwarding me relevant links, Enrico!) it seems that you're using
a Naive Bayes-based technique (usually used in spam filtering). Am I
right about this?

Regards,

-- 
hanna m. wallach
http://join-the-dots.org/



More information about the Debtags-devel mailing list