[Debtags-devel] A first use of the bayesian tagger

Enrico Zini enrico at enricozini.org
Tue Oct 25 20:42:06 UTC 2005


Hello,

I needed to do a very tedious task: verifying the tag patch from the
packagebrowser for inclusion in the Packages file.  It's a 1megabyte tag
patch.  AArgh.

I needed a help.  But it's so tedious that I can't ask anyone without
feeling guilty.

Who's good at tedious tasks?  Computers are.  So I need an artificial
intelligence, and we happen to have one.

Step one: make it work.

  apt-get install libparse-debian-packages-perl dh-make-perl
  dh-make-perl --cpan Heap::Priority --desc "Heap::Perl needed for Ben's AI tagger"
  dpkg -i libheap-priority-perl_0.01-1_all.deb

Step two: training.

  ./create-data.pl --max-good=100 --bad-ratio=2 filetransfer::ftp
  ./bayesian-tagger.pl --train filetransfer::ftp enrico

Step three: giving it a try.  Someone added filetransfer::ftp to
apt-howto-ca, and I don't agree, and someone added it to aria, which I
agree.  What does ai-tagger thinks?

  ./bayesian-tagger.pl -p apt-howto-ca filetransfer::ftp
  Package apt-howto-ca was categorized as unsure  with a posterior to be good of 0.871862649929841
  ./bayesian-tagger.pl -p aria filetransfer::ftp
  Package aria was categorized as good  with a posterior to be good of 0.958824575032506

DUDE!  You rock!

Let's see what happens if I keep for manual verifications only those
ones for which the ai-tagger is unsure...

 <coding... coding... coding... clickety clickety click...>

This will need a bit of scripting, I'll followup with the results.


Ciao,

Enrico

--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico at enricozini.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debtags-devel/attachments/20051025/f567e9d8/attachment.pgp


More information about the Debtags-devel mailing list