[Debtags-devel] "tagcoll findspecials"
Enrico Zini
enrico@enricozini.org
Fri, 20 May 2005 15:23:00 +0200
--LQksG6bCIzRHxTLp
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Hello,
I'm doing some research on finding ways of automatically creating a list
of "toplevel facets", defined as the minimum set of facets that one
could display as starting points for a search and still be able to find
all packages from there.
In this quest, I implemented a 'facetcoll' function in 'debtags', that
can be used to generate a tagged collection in which every package is
tagged with just the facets of its tags. For example, debtags
(implemented-in::c++, interface::commandline, suite::debian,
use::searching) would become tagged with just "implemented-in,
interface, suite, use".
Once I have this collection, I can run tagcoll on it and get something
fun:
# Count the number of facets
$ debtags facetcoll | tagcoll reverse | wc -l
31
# Get the list of toplevel facets if we created a smart hierarchy with
# the facets only
debtags facetcoll | tagcoll hierarchy | cut -f2 -d/ | cut -f1 -d: | sort =
| uniq | wc -l
26
And that's a first narrowing step: from 31, we got down to 26. Having
a look around, I think that 26 could become much better. The toplevel
facets include stuff like 'web', 'x11' or 'uitoolkit' which I feel could
get out of there somehow. In fact, 'uitoolkit' should all be inside
'interface' somehow. What are the packages that have
uitoolkit::something and not interface::something?
debtags facetcoll | grep uitoolkit | grep -v interface | cut -d: -f1 |
sort | uniq | wc -l
2140
2140 packages that probably need some love. Some examples: xvncviewer
(should be interface::x11), wesnoth (should be interface::sdl) and so
on. (I just sent a tag patch for xvncviewer).
This is a nice way of seeing where work is needed. But we can have
more.
I then implemented a new 'findspecials' feature in 'tagcoll': it creates
a smart hierarchy, and then for each toplevel node it shows what are
those packages that made it a toplevel node rather than putting it
inside some other node.
debtags facetcoll | tagcoll findspecials
(see results in http://www.enricozini.org/store/specials.txt)
Look at 'dbtech' there: 5 items only! It sounds like not worth being
toplevel for 5 items only, isn't it?
wget -qO- http://www.enricozini.org/store/specials.txt | grep -v '^ '=20
special: 4425 items, 0 special items:
devel: 3783 items, 3736 special items:
role: 2676 items, 2072 special items:
uitoolkit: 2257 items, 1231 special items:
use: 2009 items, 923 special items:
langdevel: 1951 items, 646 special items:
suite: 1585 items, 124 special items:
media: 1459 items, 323 special items:
interface: 1020 items, 109 special items:
protocol: 578 items, 89 special items:
game: 577 items, 13 special items:
format: 415 items, 32 special items:
implemented-in: 395 items, 17 special items:
hardware: 390 items, 90 special items:
debian-edu: 375 items, 29 special items:
culture: 340 items, 101 special items:
field: 306 items, 59 special items:
data: 263 items, 27 special items:
x11: 259 items, 45 special items:
admin: 241 items, 82 special items:
web: 240 items, 11 special items:
security: 221 items, 33 special items:
sound: 158 items, 20 special items:
dbtech: 153 items, 5 special items:
accessibility: 55 items, 21 special items:
That's another place that could use some love! Look at game, format,
implemented-in, data, web, dbtech, sound, accessibility...
To wrap it up, it looks like a good way to go, which it not working now
not because the algorithm is bad, but because the data could be better.
Plus, we now have a way of spotting what needs more work.
/me is considering auto-generating some HTML pages with TODO-lists of
packages pointing at Erich's packagebrowser. Let me hack a bit into
it...
Ciao,
Enrico
--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>
--LQksG6bCIzRHxTLp
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFCjeS09LSwzHl+v6sRAvphAJ4y5246FSc6bk5gY61qjjKX9X3FxACfXFSf
d6XYHJ35Fliaj02JjlJpVPk=
=dr2D
-----END PGP SIGNATURE-----
--LQksG6bCIzRHxTLp--