[Debtags-devel] tagging AI, bayes and fuzzy tags

Peter Rockai (mornfall) mornfall@kalyxo.org
Sun, 28 Nov 2004 18:01:15 +0100


--nextPart14893539.nBHnCnOTqU
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hello!

I see the bayes network idea is being worked on, even if i only follow loos=
ely=20
the list. I have seen that you have slight problems with accuracy, and i li=
ke=20
to assert that part of this is due to insufficient input data per package. =
So=20
logical next proposal would be to widen the data base for individual=20
packages. Candidates include /usr/share/doc/<package> (after decompression,=
=20
no good feeding bayes with gzip data :)), manpages and so forth. And i tend=
=20
to be convinced these should improve general tagging accuracy. Also, the=20
script raises another question. How useful is a tag (say, role::utility), i=
f=20
the tags within tend to be rather unrelated? How is utility defined? I tend=
=20
to think that tags that give very poor results with bayesian filter tend to=
=20
be rather loosely defined and will be troublesome with human editors as wel=
l.=20
Since there is no strong definition of role::utility, it is left to judgeme=
nt=20
of editor to assign it or do not. However, there is an user, whose judgemen=
t=20
may be a different one, thus my worry that such poorly defined tags cause=20
more harm than use. Because user may have a different idea of what utility =
is=20
than editor, it can skew the search results... Well, 'nuff said, i hope you=
=20
get what i mean :). I'll leave what to do with this problem to you, since=20
it's your project after all (whoever you means here). But i think the bayes=
=20
filter is highlighting a problem here, which may be more of a problem with=
=20
the tag than with the bayes...

Hmm, as i typed in the subject, i got another idea. It may be useful to add=
=20
fuzzy tags, say, currently tags are discrete values, 0 or 1, for each=20
package. What about allowing real range 0-1 there? ;). Well, this is a bit=
=20
off the cuffs idea, but IMHO makes sense, but i'm not sure where is it's=20
place ;). I leave judgment of implications of this on you, dear reader.

Anyway, thanks for reading and thumbs up for the effort,

Yours,
    Peter.

=2D-=20
Peter Rockai (mornfall)  |  mornfall()kalyxo!org  |  http://www.kalyxo.org
=2D------------------------------------------------------------------------=
=2D-
He says gods like to see an atheist around. Gives them something to aim at.
                       -- (Terry Pratchett, Small Gods)

--nextPart14893539.nBHnCnOTqU
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQBBqgRhvQmfopLcAqkRAiCaAKCmeaP60IKKHiXfruW9l0ch+SfNyACfSBP4
XoS3cGinwMNfeot7SE9J7SA=
=MIHt
-----END PGP SIGNATURE-----

--nextPart14893539.nBHnCnOTqU--