How to go on with tags

Enrico Zini zinie@cs.unibo.it
Wed, 31 Dec 2003 18:56:25 +0100


--AqsLC8rIMeq19msA
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline

Hello,

I quite agree with Evan's post that this tagging stuff is a
non-programming and huge task which is extremely needed asap because the
Debian package list is not human-handleable anymore for many kinds of
handlings.

I also agree that it's not much of a programming task, but more of a
data-entry task, although programming can be of great help to ease
data-entry and to produce prototype to test different kinds of
categorizations.

As of data sources, we now have:
 - Debram
 - Trove
 - Eric's package tags (partly derived from aptitude)

Our real goal is to build an ontology of free software.  Semantic web
is all about ontologies, and we still haven't looked at it.

The task is quite hard and would be a nice university research project,
at least for some analysis phases to see which direction we can take.

Since we are not a university, it seems that the only option we have is
as usual, with trials and errors.  So, we should set low goals and start
moving towards some small, clear improvements, then see what happens.

In this sense, debtags and the tags integration in synaptic has shown to
be a very fruitful research ground to test ideas, and I'm sorry for my
lack of time that is preventing me to do more trials and errors with it.

Thanks to Eric and others for finding out problems with the current
cathegorization, we see roads for further improvements.  After my
graduation (hopefully in March), I look forward to playing again with
it.

Another "side effect" that I see in this work is the building of a
look-at-the-forest overview of free software in general.  Collecting
data and inventing algorithms to generate views on them is a very
interesting way of "taking pictures" of the free software phenomenon.
This last thing could even turn out to become the most important aspect
of our work.

I have this idea that we can't and we shouldn't forecast what will be
our destination.  AFAICT we're advancing in nothing, and as we advance
we create paths.  Later, we can have a look at these paths and decide if
and which one we want to follow.

As we move, we create.  Just moving is probably what we should do for
now.  It might turn out to be difficult to involve the whole Debian
community in such a chaotic explorative way, so we might consider some
intermediate involvement patterns.

As we involve people, we gather data.  Data that could be useless today
could be useful tomorrow.  I agree with era that we should keep track of
where the data is from, though, to always be able to sort out data of
different nature, merging and unmerging it as needed.

So one thing we could do now is to add a Source: field to the tag
vocabulary, and then stick everything in:
	Source: dependencies
	Source: trove
	Source: debram
	Source: aptitude
We could also gather smaller/partial contributions under this Source:
field.  We could maintain different tag collection files from different
people and merge them when we need to use them (tagcoll already does
this merging).

This is the way I'd go.  I hope March comes quickly.  Well, not: I still
have a lot of writeup to do :)


Ciao,

Enrico

--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>

--AqsLC8rIMeq19msA
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/8w3I9LSwzHl+v6sRAi4QAJ9scM5n4y2HkgqG4Dw/l034Pcey3wCfYsVM
rqEMm638bNBeSopjjMqrnjo=
=L+c8
-----END PGP SIGNATURE-----

--AqsLC8rIMeq19msA--