[Debtags-devel] Autodebtag and Debram

Enrico Zini enrico@enricozini.org
Mon, 18 Oct 2004 14:53:49 +0200


--BOKacYhQ+x31HxR3
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline

Hello,

pushed by ideas spawned from yesterday's discussion, I worked a bit on
autodebtag and the debram database.

One of these ideas was to check if, considering packages in a single
debram ramification, all packages in a ramification had at least one
common tag.  It turned out that on 302 ramifications, 248 missed common
tags.  This is a way to get suggestions for debtags: having a look at
those 248 we can spot cases in which tagging is insufficient.

Another idea was to check if, given the tag set obtained by merging all
tagsets of all packages in a ramification, the distance of the tagset of
such packages from the union is too bit.  That is, checking that
packages in a ramification are somehow similar also for debtags.

Some example results:
  Ramification 1965 - merged tags: cd, role::utility, file-formats, role::client, media::file
	Package cddb has distance 4 from the merged tagset
	Package cd-discid has distance 3 from the merged tagset

  It turns out that cddb has "role::client, role::utility" and cd-diskid has
  "role::utility" only: well spotted!

Another idea is to add the name of the ramification to the package
metadata fed to the bayesian scanner.  In this way, some of the
knowledge in debram about similarity among packages could make its way
into the generated tags.  I still haven't looked at this, though.

The idea, however, is to get from debram the notion of "these packages
are somehow similar", and compare it with debtags' idea of similarity.
It turns out that debram's idea is better at the moment, so it can be
used to check debtags.

I also wrote and committed the perl modules Set.pm and Debtags.pm, which
so much look like the start of a libdebtags-perl, and are already quite
useful and clean.


Ciao,

Enrico

--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>

--BOKacYhQ+x31HxR3
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBc7zd9LSwzHl+v6sRAl3LAJ4ysPPB+oXUtlhzj+NjbBUs+5RhdwCfZhcq
xt0ptWmQU2P94pHtvcZyTUM=
=0yFt
-----END PGP SIGNATURE-----

--BOKacYhQ+x31HxR3--