[Debtags-devel] tagging AI, bayes and fuzzy tags
Peter Rockai (mornfall)
mornfall@kalyxo.org
Sun, 28 Nov 2004 18:01:15 +0100
--nextPart14893539.nBHnCnOTqU
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Hello!
I see the bayes network idea is being worked on, even if i only follow loos=
ely=20
the list. I have seen that you have slight problems with accuracy, and i li=
ke=20
to assert that part of this is due to insufficient input data per package. =
So=20
logical next proposal would be to widen the data base for individual=20
packages. Candidates include /usr/share/doc/<package> (after decompression,=
=20
no good feeding bayes with gzip data :)), manpages and so forth. And i tend=
=20
to be convinced these should improve general tagging accuracy. Also, the=20
script raises another question. How useful is a tag (say, role::utility), i=
f=20
the tags within tend to be rather unrelated? How is utility defined? I tend=
=20
to think that tags that give very poor results with bayesian filter tend to=
=20
be rather loosely defined and will be troublesome with human editors as wel=
l.=20
Since there is no strong definition of role::utility, it is left to judgeme=
nt=20
of editor to assign it or do not. However, there is an user, whose judgemen=
t=20
may be a different one, thus my worry that such poorly defined tags cause=20
more harm than use. Because user may have a different idea of what utility =
is=20
than editor, it can skew the search results... Well, 'nuff said, i hope you=
=20
get what i mean :). I'll leave what to do with this problem to you, since=20
it's your project after all (whoever you means here). But i think the bayes=
=20
filter is highlighting a problem here, which may be more of a problem with=
=20
the tag than with the bayes...
Hmm, as i typed in the subject, i got another idea. It may be useful to add=
=20
fuzzy tags, say, currently tags are discrete values, 0 or 1, for each=20
package. What about allowing real range 0-1 there? ;). Well, this is a bit=
=20
off the cuffs idea, but IMHO makes sense, but i'm not sure where is it's=20
place ;). I leave judgment of implications of this on you, dear reader.
Anyway, thanks for reading and thumbs up for the effort,
Yours,
Peter.
=2D-=20
Peter Rockai (mornfall) | mornfall()kalyxo!org | http://www.kalyxo.org
=2D------------------------------------------------------------------------=
=2D-
He says gods like to see an atheist around. Gives them something to aim at.
-- (Terry Pratchett, Small Gods)
--nextPart14893539.nBHnCnOTqU
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
iD8DBQBBqgRhvQmfopLcAqkRAiCaAKCmeaP60IKKHiXfruW9l0ch+SfNyACfSBP4
XoS3cGinwMNfeot7SE9J7SA=
=MIHt
-----END PGP SIGNATURE-----
--nextPart14893539.nBHnCnOTqU--