News on the central database

Enrico Zini enrico at enricozini.org
Fri Jun 22 19:36:52 UTC 2007


Hello,

after a nice discussion with Justin at DebConf7, I made a few
interesting changes to how the central database works.

It all started with observing that the special::not-yet-tagged tags
sometimes do not appear alone, but together with a few tags that people
threw in, but didn't feel like marking the package as tagged.

Therefore we could interpret those tags as 'not edited by a human'.
Since the tags are only removed manually, we can assume that when they
disappear a human has had a look at the tag list, gave it a polish and
said "ok".

Now, we have at least 4 working engines capable of suggesting tags for
packages:

 1. The autotagger inferring tags from package names, dependencies and
    so on;
 2. The AI tagger maintained by Erich, which has been waiting for
    something to do since quite a while;
 3. The Xapian-powered 'suggested packages' engine used in the tagging
    interface;
 4. The Supermarket engine that I put together during DebConf7 with the
    help of Alain Schroëder, which you can now see in action with the
    suggestions in the web tagging interface.

If you read that list of automated tagging engines and read the phrase
"not edited by a human", what is the first thought that comes to your
mind?  "Edited by a machine", of course.

So here comes the new organisation of the central database: packages
with the 'not-yet-tagged' tags get our four engines to throw tags at
them.  Packages without the 'not-yet-tagged' tags are left alone by our
automated engines, are maintained by people and are used to train the
engines.

I've starting updating the website in this sense and I brought online
the bits of the autotagger that do not require scanning dependencies.
More will follow in the next days, including a way to avoid the tags
supplied by the automated engines to end up cluttering my manual review
queue or the Packages file.


Ciao,

Enrico

-- 
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico at debian.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debtags-devel/attachments/20070622/8f0b66f8/attachment.pgp 


More information about the Debtags-devel mailing list