News on the central database
Enrico Zini
enrico at enricozini.org
Fri Jun 22 19:36:52 UTC 2007
Hello,
after a nice discussion with Justin at DebConf7, I made a few
interesting changes to how the central database works.
It all started with observing that the special::not-yet-tagged tags
sometimes do not appear alone, but together with a few tags that people
threw in, but didn't feel like marking the package as tagged.
Therefore we could interpret those tags as 'not edited by a human'.
Since the tags are only removed manually, we can assume that when they
disappear a human has had a look at the tag list, gave it a polish and
said "ok".
Now, we have at least 4 working engines capable of suggesting tags for
packages:
1. The autotagger inferring tags from package names, dependencies and
so on;
2. The AI tagger maintained by Erich, which has been waiting for
something to do since quite a while;
3. The Xapian-powered 'suggested packages' engine used in the tagging
interface;
4. The Supermarket engine that I put together during DebConf7 with the
help of Alain Schroëder, which you can now see in action with the
suggestions in the web tagging interface.
If you read that list of automated tagging engines and read the phrase
"not edited by a human", what is the first thought that comes to your
mind? "Edited by a machine", of course.
So here comes the new organisation of the central database: packages
with the 'not-yet-tagged' tags get our four engines to throw tags at
them. Packages without the 'not-yet-tagged' tags are left alone by our
automated engines, are maintained by people and are used to train the
engines.
I've starting updating the website in this sense and I brought online
the bits of the autotagger that do not require scanning dependencies.
More will follow in the next days, including a way to avoid the tags
supplied by the automated engines to end up cluttering my manual review
queue or the Packages file.
Ciao,
Enrico
--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico at debian.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debtags-devel/attachments/20070622/8f0b66f8/attachment.pgp
More information about the Debtags-devel
mailing list