Google Summer of Code

Alex de Landgraaf alex at
Tue May 30 23:50:32 UTC 2006

Enrico Zini wrote:
> In the meantime you can check out
> svn+ssh:// and have a look at the
> "process" script: that's where I tried to automate most of the things.
> Ask me any questions about it on IRC or on the list, and I'll answer as
> soon as I'm at the computer.

The tag-approval part of debtags seems like a good place to start, as it
sounds like it is holding debtags back most. I've played around a bit
with your process script, let me reiterate the steps to make sure I'm
right on this (please correct me where I'm wrong):

- - tags are added or removed via debtags submit (or the online
- - the current tags are uploaded to /tags/tags-current.gz, probably a
daily database snapshot
- - tagcoll is used to diff the tags between tags-current.gz and those in
Packages (the tags visible via debtags et al)
- - these differences are (currently manually) either approved and moved
into SVN (to be packaged, I presume) or rejected, in which case they are
removed from the central database

Thus a starting point would be to use the Bayesian classifier to
facilitate either approving or rejecting the new tags in the
tagcoll-diff. I've been trying out Ben's Bayesian tagger implementation
and have been surprised at the accuracy, although I might have simply
chosen simple tag/package combinations.

If this was the general idea and Erich doesn't disagree I'll see if I
can string together a proof-of-concept (try to have the classifier
review the changes for a single facet), should be fun,



- --
| Alex de Landgraaf            | The cure for boredom is curiosity |
| Student AI & CS, VU, A'dam   |  There is no cure for curiosity   |
| Phone: 06-16844084           |                                   |
