[Debtags-devel] tagging AI, bayes and fuzzy tags

Enrico Zini enrico@enricozini.org
Tue, 30 Nov 2004 21:40:49 +0100


--8t9RHnE3ZwKMSgU+
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Nov 30, 2004 at 08:23:53PM +0100, Benjamin Mesing wrote:

> This sound very sensible and I wonder why this never occured to me!
> Still there is one problem: this needs the full packages to be available
> on the training system. This might be achieved on a powerfull server,
> but is not possible on my system - so there would be not much
> possibility to test this stuff until there is such a server available.

I can run a data-harness script on a debian machine with a local mirror,
if needed, and then I can put online the harnessed data.  I can even
schedule this on a cron-job, so that you can get fresh data every
morning.

Is such a harness script doable?


> > of editor to assign it or do not. However, there is an user, whose judg=
ement=20
> > may be a different one, thus my worry that such poorly defined tags cau=
se=20
> > more harm than use.=20
> Hmm, I tend to agree with this here. Nevertheless I think there are
> cases which are quite well defined where bayesian might fail due to too
> much diversity.

Sure.  And the tagging precision could give an index on how good is a
tag, and an indication of where the ontology needs reworking.  That
would be so cool!


> > Hmm, as i typed in the subject, i got another idea. It may be useful to=
 add=20
> > fuzzy tags, say, currently tags are discrete values, 0 or 1, for each=
=20
> > package. What about allowing real range 0-1 there? ;). Well, this is a =
bit=20
> > off the cuffs idea, but IMHO makes sense, but i'm not sure where is it'=
s=20
> > place ;). I leave judgment of implications of this on you, dear reader.
> This sounds cool as it would allow ranking searches with most relevant
> at the top! But I would definetely not want to implement this, lets see
> what Enrico says :-) Another problem is, that it would make bayesian
> tagging not suitable any more.

It would be really subjective to assign such a value to tags, and
probably would require more effort than it's worth.

I'd apply the YouAintGonnaNeedItYet pattern
(http://www.c2.com/cgi/wiki?YouArentGonnaNeedIt) and leave this for when
we really absolute need it.  There are more important things we need
anyway now ;)

/me leaves to meet Free Ekanayaka and Kalfa to talk about debtags with
some glasses of wine in a local osteria.


Ciao,

Enrico

--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@enricozini.org>

--8t9RHnE3ZwKMSgU+
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBrNrR9LSwzHl+v6sRAiDIAJsHcQRO5H0rzLubC8XvJGgx4XGfEACfRkGy
F7MfGZKttr+CmI9sQevmuOQ=
=6imL
-----END PGP SIGNATURE-----

--8t9RHnE3ZwKMSgU+--