[Debtags-devel] Re: fuzzy part (unsure)
Benjamin Mesing
bensmail@gmx.net
Sat, 06 Nov 2004 17:58:09 +0100
> > Added a fuzzy part (unsure). Currently we are unsure if the
> > likelyhood is between 0.2 and 0.9
>
> How promising does it look?
See below for yourself (and compare with the results I posted before).
The general trend it that we loose more in matches than we loose in
mismatches, so most of the unsure packages come from packages that would
have been categorized correctly. Nevertheless I would consider it better
than the original solution. For once simply for the fact that,
considering everything that is not good with a probability of 90% is
bad, is an approach quite insane.
Currently I am working on a small cgi Script, to allow storing the
application on a server.
Greetings Ben
./bayesian-tagger.pl -nt role__server/
Tested packages: 491
Expected to be good: 246
Expected to be bad: 245
Matches: 382 ^= 0.778004073319756
Mismatches: 53 ^= 0.107942973523422
Unsure: 56 ^= 0.114052953156823
Expected good, but wielded bad: 10 ^= 0.040650406504065
Expected good, but wielded unsure: 10 ^= 0.040650406504065
Expected good, and wielded good: 226 ^= 0.91869918699187
Expected bad, but wielded good: 43 ^= 0.175510204081633
Expected bad, but wielded unsure: 46 ^= 0.187755102040816
Expected bad, and wielded bad: 156 ^= 0.636734693877551
./bayesian-tagger.pl -nt implemented-in__c
Tested packages: 99
Expected to be good: 50
Expected to be bad: 49
Matches: 51 ^= 0.515151515151515
Mismatches: 27 ^= 0.272727272727273
Unsure: 21 ^= 0.212121212121212
Expected good, but wielded bad: 9 ^= 0.18
Expected good, but wielded unsure: 9 ^= 0.18
Expected good, and wielded good: 32 ^= 0.64
Expected bad, but wielded good: 18 ^= 0.36734693877551
Expected bad, but wielded unsure: 12 ^= 0.244897959183673
Expected bad, and wielded bad: 19 ^= 0.387755102040816
./bayesian-tagger.pl -nt media__mail
Tested packages: 271
Expected to be good: 136
Expected to be bad: 135
Matches: 213 ^= 0.785977859778598
Mismatches: 36 ^= 0.132841328413284
Unsure: 22 ^= 0.0811808118081181
Expected good, but wielded bad: 1 ^= 0.00735294117647059
Expected good, but wielded unsure: 5 ^= 0.0367647058823529
Expected good, and wielded good: 130 ^= 0.955882352941177
Expected bad, but wielded good: 35 ^= 0.259259259259259
Expected bad, but wielded unsure: 17 ^= 0.125925925925926
Expected bad, and wielded bad: 83 ^= 0.614814814814815
./bayesian-tagger.pl -nt special__meta
Tested packages: 81
Expected to be good: 41
Expected to be bad: 40
Matches: 75 ^= 0.925925925925926
Mismatches: 4 ^= 0.0493827160493827
Unsure: 2 ^= 0.0246913580246914
Expected good, but wielded bad: 2 ^= 0.0487804878048781
Expected good, but wielded unsure: 1 ^= 0.024390243902439
Expected good, and wielded good: 38 ^= 0.926829268292683
Expected bad, but wielded good: 2 ^= 0.05
Expected bad, but wielded unsure: 1 ^= 0.025
Expected bad, and wielded bad: 37 ^= 0.925
./bayesian-tagger.pl -nt special__meta
Tested packages: 81
Expected to be good: 41
Expected to be bad: 40
Matches: 75 ^= 0.925925925925926
Mismatches: 4 ^= 0.0493827160493827
Unsure: 2 ^= 0.0246913580246914
Expected good, but wielded bad: 2 ^= 0.0487804878048781
Expected good, but wielded unsure: 1 ^= 0.024390243902439
Expected good, and wielded good: 38 ^= 0.926829268292683
Expected bad, but wielded good: 2 ^= 0.05
Expected bad, but wielded unsure: 1 ^= 0.025
Expected bad, and wielded bad: 37 ^= 0.925
./bayesian-tagger.pl -nt use__configuring/
Tested packages: 126
Expected to be good: 63
Expected to be bad: 63
Matches: 99 ^= 0.785714285714286
Mismatches: 11 ^= 0.0873015873015873
Unsure: 16 ^= 0.126984126984127
Expected good, but wielded bad: 5 ^= 0.0793650793650794
Expected good, but wielded unsure: 3 ^= 0.0476190476190476
Expected good, and wielded good: 55 ^= 0.873015873015873
Expected bad, but wielded good: 6 ^= 0.0952380952380952
Expected bad, but wielded unsure: 13 ^= 0.206349206349206
Expected bad, and wielded bad: 44 ^= 0.698412698412698
./bayesian-tagger.pl -nt hwtech::cd
Tested packages: 40
Expected to be good: 20
Expected to be bad: 20
Matches: 31 ^= 0.775
Mismatches: 5 ^= 0.125
Unsure: 4 ^= 0.1
Expected good, but wielded bad: 1 ^= 0.05
Expected good, but wielded unsure: 0 ^= 0
Expected good, and wielded good: 19 ^= 0.95
Expected bad, but wielded good: 4 ^= 0.2
Expected bad, but wielded unsure: 4 ^= 0.2
Expected bad, and wielded bad: 12 ^= 0.6
./bayesian-tagger.pl -nt interface__commandline/
Tested packages: 104
Expected to be good: 52
Expected to be bad: 52
Matches: 59 ^= 0.567307692307692
Mismatches: 29 ^= 0.278846153846154
Unsure: 16 ^= 0.153846153846154
Expected good, but wielded bad: 5 ^= 0.0961538461538462
Expected good, but wielded unsure: 6 ^= 0.115384615384615
Expected good, and wielded good: 41 ^= 0.788461538461538
Expected bad, but wielded good: 24 ^= 0.461538461538462
Expected bad, but wielded unsure: 10 ^= 0.192307692307692
Expected bad, and wielded bad: 18 ^= 0.346153846153846
./bayesian-tagger.pl -nt data__font/
Tested packages: 30
Expected to be good: 15
Expected to be bad: 15
Matches: 30 ^= 1
Mismatches: 0 ^= 0
Unsure: 0 ^= 0
Expected good, but wielded bad: 0 ^= 0
Expected good, but wielded unsure: 0 ^= 0
Expected good, and wielded good: 15 ^= 1
Expected bad, but wielded good: 0 ^= 0
Expected bad, but wielded unsure: 0 ^= 0
Expected bad, and wielded bad: 15 ^= 1