[Debtags-devel] Second Preview

Benjamin Mesing bensmail@gmx.net
Wed, 13 Oct 2004 16:30:41 +0200


Hello,

>   Package: libhs
>   Priority: optional
>   Section: libs
>   Installed-Size: 68
>   Maintainer: RISKO Gergely <risko@debian.org>
>   Architecture: i386
>   Version: 0.1.3
>   Depends: libc6 (>= 2.2.4-4)
>   Filename: pool/main/libh/libhs/libhs_0.1.3_i386.deb
>   Size: 5556
>   MD5sum: 4c3e84b6407a8f65ec71ea7f803524cc
>   Description: The HighScore Library (run-time library)
>    The HighScore Library is a small library to make the programming
>    of high score tables easier. Its features:
>      * you can give default data for the case if no previous highscores
>      * you can simple insert to a table with use of a comparison function
>      * you can store any data with the score itself
>      * FILE LOCKING
>      * more difficulty level for the same program
>    .
>    Now this library is used by bombardier (0.7.3) or above
Well actually there could be some clues for tagging this. E.g. other
game packages will depend on this, other game packages could contain the
token highscore, level or difficulty. Or perhaps the maintainer does
maintain other game packages.
I wanted to test this, but my tag database seems to be quite messy.
"debtags grep game" wields 


But I think I get your point. There will allways be packages were the AI
can not decide correctly. In opimal cases the AI would tell you that it
is uncertain - but for bayesian filters it happens seldom that the AI is
uncertain - which is often critizied about Naive Bayes. Currently I even
do not try to guess if the package was uncertain - everything the system
does not consider to be good is bad. I also think there are a lot of
packages where different humans would tag differently - so the AI is as
you said, simply another opinion. I think the system could give, if
running on a server, hints for package maintainers when tagging is done
by them. Giving those hints is especially neccessary, because probably
mosts maintainers won't know about all the tags out there.

Greetings Ben

I gave libhs a try - using use::gaming which seemed to me the only tag
which might suit it. Here are the results:

~/lang/perl/bayesianTagger> ./bayesian-tagger.pl -nt -v -v use::gaming
main::trainPackage() called too early to check prototype at
./bayesian-tagger.pl line 80.
main::trainPackage() called too early to check prototype at
./bayesian-tagger.pl line 84.
Loaded 3694 tokens from use__gaming/good.db
Loaded 4354 tokens from use__gaming/bad.db
Loaded 303 good and 265 bad messages.
Testing libhs
Posterior: (programming, 0.099437148217636)
Posterior: (score, 0.883595850941222)
Posterior: (library, 0.140198153135821)
Posterior: (7, 0.159159159159159)
Posterior: (file, 0.216414863209473)
Posterior: (high, 0.777696258253852)
Posterior: (level, 0.777696258253852)
Posterior: (tables, 0.227467811158798)
Posterior: (small, 0.732125096695768)
Posterior: (use, 0.270082460854257)
Posterior: (difficulty, 0.69790628115653)
Posterior: (function, 0.30635838150289)
Posterior: (3, 0.320158784949948)
Posterior: (make, 0.326999012833169)
Posterior: (now, 0.671128798842258)
Good posterior: 0.0122532843848668
BAD: good package libhs did not match!

So it seems that even though there were some clues for use::gaming it
wasn't sufficient to convice the filter.
Another point can be seen here. Even though the probabilities does not
seem that obvious to me, the bayesian filter tells that the likelyhood
of this package being good is only 0.012, i.e. 1.2%. This is what I
mentioned above.