[Debtags-devel] Second Preview

Benjamin Mesing bensmail@gmx.net
Tue, 12 Oct 2004 21:18:09 +0200


--=-UsKJg3EZU5ZNs2+pyWn2
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Hello, 

today I have incorporated the usage of the dependencies and provides. It
starts to get a little slow now, but is still bearable. The precision
seemed to be improved dramatically (as expected):

        uitoolkit::gtk
        Tested packages: 741
        Expected to be good: 615
        Expected to be bad: 126
        Matches: 693 ^= 0.935222672064777
        Mismatches: 48 ^= 0.0647773279352227
        Expected good, but wielded bad: 18 ^= 0.0292682926829268
        Expected bad, but wielded good: 30 ^= 0.238095238095238
        
        uitoolkit::qt
        Tested packages: 457
        Expected to be good: 324
        Expected to be bad: 133
        Matches: 425 ^= 0.929978118161926
        Mismatches: 32 ^= 0.0700218818380744
        Expected good, but wielded bad: 12 ^= 0.037037037037037
        Expected bad, but wielded good: 20 ^= 0.150375939849624

Nevertheless the results are still not acceptable, so I will try to fine
tune further, even though somehow I doubt that we can go 95% precision
in average, meaning worse results for some tags. The QT and GTK have a
training set quite large, so it is one of the "good" cases. We don't
need to bother for tags with only 10 packages allready tagged, cause we
could probably play dice.

I will try to improve precision in the coming days and report back than.
One parameter I am unsure about is how many bad packages should be used
for training. Perhaps as much as good one, perhaps more? Here the
tagged_complete tag would come handy, cause we could use this tag to
select the bad packages
        debtags grep "! data::font && ! special::completly-tagged"
instead of
	debtags grep "! data::font && ! special::not-yet-tagged"


So far for now.

Greetings Ben

--=-UsKJg3EZU5ZNs2+pyWn2
Content-Disposition: attachment; filename=bayesianTagger_2004-10-12.tar.bz2
Content-Type: application/x-bzip-compressed-tar; name=bayesianTagger_2004-10-12.tar.bz2
Content-Transfer-Encoding: base64

QlpoOTFBWSZTWf0RN1EAEZR/kPjwAKB//////+//j/////8EAIAIYB9e99AeqD213WZoruYUnTIj
gDu3u54oWt3c7t02vXnvZOyY9q7rWVwbeAB7DXR4ins3Qu9c90NPTe2SNdzW3TxNXuO5B6N21mEo
ITQ0AhNMUxMTKYNU8kZono00yhp6T0gHoQ009J6nqAlNBAkJip+giHqaT0J6amahpkekMTQaAAaB
poAA4AGgNA0ADTTIADRpkAGjJggMQAAJNJJAk9Ep5pEyNtFPUeo2oD0RoDQekHlHlAAAaaAESgpp
pTaZTU2npRtJtPSaajT1MDRMR6I00aAMgDIAGgk1IRNAEaExNTNKnk9qTap+QnojR6kyGgDIAGEG
J6j0+d/B+85/rm/mcDZigcrMiLfIbJHfysLOsp0r6UnDoYdc5ExJhCFKqaoJGKQ0xtAJOwLxSApB
YKooisBEEYwrRC0FihBEBEbaiFLRilaMlaIsUEGCWJQREs+Lya7KazaCgsUMbUYUvyjMNks0lFnn
wlWKLBiwYxRFBUY8kKPBGhScKWRUR5JUWKMAURGGNkYgVklVA1QoOWirLQUoMhMpmBSDCDC2gMAS
sKEw0nA4Dnz9mzMkx+pHdGL3R/t474jSN37IEr0LR6/y79j3djN9izhSolsocuXlzWwn9n+ern4M
7jo5crjCuLTJQZQpfUb4C3ViDCxd24oLpJkhZdpWxYtAUKgbmB3WYR+xaxZIhvV6e7QNcmXighS2
DQTub7OUKm5qsFNZoNzUJTvEiF/V8NyCsEERZqCLbRfc/oj4L1/k2nX4fT7/QJ1X4uWj1+nluO2t
piK21ltdYKYuZl9Zz5/e6PD6/EfDmGT2vYMKaDGlSyk0UlRFSCj/axBdystxWZJgon4wEMduOJJ5
qcXorMVbCs7JNYe3G/zdxnOXrO8JE3JBtvlmGvUU0JEQYqKKpS0REFHTxHgXiELOqhkjMudiL7bn
htm4Jnjcgxhuq8SluGW21nbF7azEvf9vgG2HFgLnVCyB9wnGHP37lc9F2ptSGuFH0So6BRzP/Ws+
Wv2GnfQvD1Ch3WG6LzvYdV+jMmA4QqpA5lCFOdg9ij57HgWHlwcr1nti56Ek72H0mwY7MTZ3N/bF
pb2rTHw3iiU1O8Zhld9t9G+bKkd6+Jxzpsk4dCAuFrFG7QwpOTPXazM1IGxypRHNbvdgFqSDCJKp
qLEXgw3vZif49jdVhEBRNf1NZuo5njc5WWuH5LKBIUl5qHS/R79Rthte4OrO0Ds0yLbEYMkZAfxV
2ti8WsRtPSlqU95t9qhcaSN5QVRib9zpFqVGUTY3Z7aT+qK86M+xqzi5vhfgoeBDFKuw405DlRRj
V4IQIIcvhYhAlB4IQMpr4kp+kuxMw/Kwx3oSVu1wcfnzGvqqzPRfrrEyrLNNnWNBjU+SaBdJC5t/
QykxMGnhs6oICopISRcQeaxMF7YusFtUjFbSprgz26mObIMRSMVDm+yQttSR62c5bcChdrnPx1jb
Dv7IbFWlvm8eMaFUI8MP7PqBph1hZtZCjHDaEjnL+khiUgQbF96m2+vRQVt2oQWMMQmx1ZI0trF+
EzcGhscjK3IYpI4PozJQK7mZ/VIhhqo/HbYhYWbs4Qi2KT7VkuchF2yuTLCA1qmcSFIBvsKRznhk
peF0IxNaaSwX76RVMXhCTYXUdVGGBrEoYYwvjk8lDIwqcwiyq6kMxk7qFKOStqmEPoFcMOHv9M/H
+aK9KumxKNKyaG3u4pQtoeAV0kMMaLJVSrIlhUbU91CBtE5FW0hSaISK4uu/aO9BoizgZOFf3by3
fHJXSeRqDUp5+sOVO+2UVmv4WBEyOu02N3oItJwI4SsKGlDvD8e+skPt3Em0vp68Wi8uylejwSc7
BRTcPgm99blKvx2H8iL+jXUYPj6a6OKiVLMmQJu8tKDXIzliad9WljjHK8s3XDaVDTzzYia5ifDQ
cYtlj3I6/C2N+qZGJCZ7U1I9JEcm4QUos+svnc7ajxEN1NiSK7utFNjPgbDvg0bCJZYK7LF5T8h1
05ri7OLNXddJApHdRs1rnslqLfHnOlRtJRUjWyFtRK6BHRhjrbCNmu2Mb9RGqN/oeW9xboVj6UlN
x201pDmRlPciYskWeOKgKd+D9QtNhkOu/N9deW2HLkw5SMSrajNtFXS4qTjm0DcYR1Q3p8bditRt
WH4E1PHTu4/SnWEwui5PzhE3865wtWM8K4UImuUPAEH0XqUVHr0qB1mxjr0veU9bHIqQ2cJWaPGv
bfJtGfirP2QYWGA0hSlxo8niNtJ2xJaT4BxCk8OZCdiw5fEoYDCWiupoKCnj8nHq1Dw/bPhOBDwG
06Z2Hh7MlLqZuqL0jRIIHl6Gh5uksmilk2kJuiIzamxlKFjGLPYYLnmnVugQ/MwnegsmhUklbF9D
Y5+WyZIqojDTIoFFiT0oUT/puIAVh+lLeH3TU0r2HaObBBPB8GHYIehNmCzvs4tctz36BpgbZna6
DYOAM1GMUTehXLYIGSyqjEFnBDMoicVcdRg6/zMA0PI7eZA3CG0Z2dN2Tg94TzJkDdzuCPFEVoh0
T3d2HHLiB7G4bpCYIo/2oVMxMMuyWSs8WgyM3hkNry/ekhhocQmTLctuTzTzp5l3d08KIVqtEp7F
oQOhlSdGSdvC7AWFq3tW83TUPQJgnSiZ5vduicUOMIhhNeBGOWuFfGZSzjTrVo1aAwQ0TK3PHXjN
IXE9xuG9tU21Me2nFEiaVvsSefqKYnJG7eZwPIbXiH1ksWm1nzGKd72TdRfgQzNIUOHmZM9zJFCH
Nc4AsgsiviLDw8z2wXxdJbFfRF641H/J+Wr6/HMuh57yfctx3cBL6znst2aryHQfa9zViYrO44Oh
hsVluOVmhAaTD7DDkwTEuo5YtdzGm7AZLcjMfJWiUZ7PI1g9s9vk7dU6dpGKvkbXqm3B8Fk+/8tt
dDmsHw5K51pehUxd/kibOWVuG0duqWLpM0sbCCUJ7mlY9cqLKzbdAzic+2s3Ox0Knv8HHxsAoocx
E+u5mdDizh9OP0behZR46n5IEN+Gy3gbka0rZzno9m8jUzurHAolCsgDFFlWyHsMy8hc5LpnUXvF
YqXIpkOVJyGF0KuRp4d7Qf14lE4rvEHcS209aYaxQoIpiGTjj9ILQO/CAiC7ENwjmXmpJOcsyq2k
oQEVJEYc0k3Tie+n8pn2xLeJerV5cbLG7cQ7s+uXdLc+ZlbmY9jnY2pNHbMszs1MT5+OG8QrTrxo
dhRg1Wqtaz6r4mlhtIMHV6BzQsRl9TWWNwyncNk4jW/PqVt+M7+krJUChAkbCvgfqy/8sH+sYYnI
6pYtGxt2zXoqKr8oG07y9IhNkdhzLfokE7RtLDiE1xsxHvx/GTC94tFNvtkJaK4fJ+HruCrF2vSh
sTaGVWjMfKm6mVN7xpvAdKnS3RzerwyrTjBiL/4HGGPWFhxHQWnzFyFwq9IPCpsQA4XjIvMmYlTj
zhSu97tHGVtWq+pAo+W6OEigsQ0oQ7RtPG9WJKcd3hBCVOA7MohQUMzHCerIqWT6a0q04prTT8n+
lrOmzd75289a4Pq0ePbomts5xLhyh31LQNLXMuL3HH0mpoGrq+v1G+OzGJtXfLbmSKl4bKz12g07
iNd4Y5GUz9y+PN5cnFYhq9B5iELH+BAe3AenBO92aSLcQUJC4kJ/WXPm3eO/rcxGEP5iH+DM0PKz
q6BMdRsSXiSXOh5Ko7nidEPCoAbuIcOnw1OeojBIAoRV7hZYBllJKygw9HfD7f6t9AU5lTEhqO0L
WxbdseyanyF9si6AwKfl2SytBUOFgKChI/z8ar2cqiSNzSebJIQ+5SYuywTaY5Xj+tWLlPyD/agH
xcXtFhagElFUyC4qETkjtR+v+VwwQh2SmSCCzE0zdZ2N5GY2La/mHmLF9hsngMv3bZU9KHSENRjX
eR+gIduz87VUhkZCqi1nKbuS0xSKgq8kId249wDl14C8jAWhbaHaEKGYh7mKe4PjAl0CXHsh1aaV
y1qq5DaHWuUhgEno5l+ccQ1FrVOwwez5Q3G8OLtk0hn0/FncuHaVsh3c8s8ELnyRFVnw6PuQqqI/
d7Ju6pxA6abmxJIyDUyDLsGqRyvRdw1dLoK5egGOdxucQ5m5k6zzqeVQZRfTHJUT5Z+blglcdVUr
tBfiINcKHhH4WQFOdlzYBVRl+w+g477vhh9Iq+nzHmL4kRS1q7YbUPR3Qkg+sUQC3aU9skId4yA1
6fA6npnCnYebn+LQrWXKckpF6hcuEPVQ5DjiSFzzh9CSRkQEQVQgiEQRkiaB0HVGfhmUD5+sqGwQ
CowEQqMh829+uqplKntOwMoXOg7gWIDGAgigL+Uh2iSQWIhAWEMyDNXvxh8Z8RazqCKGqBzHtF8D
UrvhQzIbM5Q4o1hzwxjAkSMRiMRFVKJYwU37gOAzhcSrkxBLj1jPQcA70hyvfL0LlOTK2NI5dIvP
lm8SsdYDZ9+2hITJWFDcidHaLyF54URK0TJVUS5GJSsU4A9h1ZugHQZTxIy1g5fCT7dYapsq1Gyz
I4XqVpOsZRO3sRRUtIh3Os+9ADiOtQTVrPEaVOviUPYDy8JINEOJ89BMPmXDL3z0/t7VwHxpuEaZ
MbGSQNVgbgB+PVkIVwqIIfMQRDaD02d3bcC3MDPzkT51UqoJKCvahluPVIIhDxKhoHQ0CV6AzuQR
qEmfNNQMuKRBEWPfLZDw2Mh3nG5eVgMhZsA3Iaxi4NECs38iiFNygKInmagdYGZ6SZauOUWWqNKO
8xZvcujDr0YaCzWBs1kWZeWk2hrNKq7OQWgpK4kpVfDkqJot1ZkzBiCjNS0RVVFVRFVVRFGIrSUK
CiIiIMYIoqqsrFGauXke4gIfcL3u7J1pvknwQTnyHf3RUgl9+DMBDym9Rw2JAPxvYcdb07DMtvAJ
zSUbDNBA5Dv5s+rwDdIY1tQ11qoo0BAt8hG5Jc9w8NEDwJAJC6bxFIo9fweK8QeWTzTiFQamiK9u
7zM3iLet1sKqYhtoaxZ6QJsueKA0FSKxbrTsWDyeBCMRV+p7YsP7hLOJjfpnMrLKrbEMysIbJAUh
khILLSNstuN87bDg9Yk2hqGpVlz14a6Lq78lY3pmhMTFoyjeprE2KwmaFrevCrhWYxa92niaZ50m
EmfLoFC2yERsPZMOuK5tlh0bqwnclYkIKloEDwRCCMbHDjyKYlGWhNMEsyMVnUHqm3qXezD4DTii
AguYN3MkA4Ds2TdTBoQxHunTitihvJyGcJY6bOQ2hXF55GIb74SwOFjBwW0OVVIiJftelpTxR7EN
jatDT0TRk2fVthjbtGEFhkMjAN1F+hNi4zPXpjXm7jUHQR3xdyGgIxjmaQoyRlo0YlQiroDBdgOb
j3IEyIGGQM9gWOhNvjGDPwGYHVhUUPNuRIoLGIWiFjO0wcOtdpyLW23tMfXOGzUOJqZFIKPZBq40
TKFGMnUAcyUYkxyZPpzPvA7pc0n52RZNk69/bTT5k38AuFixAZImlWsPa80c0MQUkgS8LQ4hCQ0O
33mUJDl2WFg5hgDYGPQbDRv6qgcD7O3XextRlEdBHyQVQMcAW9gXlHcRPQKhK7FIEDdZAn4FFHUc
AtY0GcA5ySxXxj6VDXy6AzhQVjmcfihnvXvv0A1RKBRH1nX6WBBgVTPSwpkYxUVslUFw4TacALwz
eeeMkeoagiDBAUNiUPF9VWyMYR+Xg+Qw6IbEnGE5+QuGsQGDE7qURinNhRUQiGBT28lDD3TMFgKx
eHPACHVAQIaQPeCqvy1SHnTLEhm7itMQjpixdDoYsGP07Si0Pd7hZKgfXOqpC9+Bq3hNk02sqG46
TonZ64d08CiWYICKVLQ0tUuW9QcdcobUolZJ2CBIskLA+sIU0eQlpCaXNeJGEwVMXgZxiIqxggQK
ceORV/MTyQMrgzBmbZsZYSHsmhkZgb487XnWhwPEXSI0XBsJpMSkIcuI5981hQaUKCi+KwqgqiQ4
9qE+OnUQPOKoD688549pZKgxiILFIwZPIdsmQhZHkdMz5+O8shb5BS01iAN/FU3S+wnafGpFCma5
9iNtUhtjlA1HORiqQVIScgMSYuCz5FxS03BDWQDK5c0IamCwUFEFBIGCTvPVUDZluoKGQ3MDkUTX
FQUYoqsUiJCBb3Kmnw7FBF6jvSP6gZOVtYsqk9RLHhIfaKXOUygh5t3PoLBLWvlBl7VCoDoKCpsQ
N9AySH4DAgiTp1Jx1ViqHyd66u8gDi9gwewnbTt6DALFDaOrWg6Jyq0bhvFWsWaBteSgHsmkDJLB
I7dMDUhu1QCZsdUtuF7omeBmlQyGvAsuIbYOfOLENqB2g0hWZmCRBQrJ4VU+Pqk4THTkTPiE0Aj3
2I+zVzsLnMGjIUCNQpCmSbeITvoDvMVqUVBva5hQ5k4knBIoUSBk4hEQhJhtnbA2O3BLbLIRcRZl
GNOrYb2mSFLSgoxYvXQoiY2RZYnlZKYUIjAGMDdOApxYW8LFZBPVTUxoUK8jZJcrrWGlFjoddOjZ
mxthkwy5lrBfS69RJDWRF2iNBth4EmGWNJsZYkVgUWC0qWJY2ng9G5qTEqD8K+pGGmZw3TioMpQC
hoOZXCsimkGhQyFDRgUApoQONiHswPr33eYdubwkN8QVUiuSopawliFFESQSx6cU8MBA/N46iYLR
M3KZ2suGCGGCAowwIM50hEiObHzs0MVGGhkQvPdnvbbIYUfUQEBjBNs0uZ4BViFeDPRUCzD8z4+s
DFjCQKyLlqd0auR65wG0Z82Ku3FEOBgWUcl5w8gV2GIwUEEgIjEAVYKAiCogZBqhty1wJvKLs2b4
bUi2ZSiEzSVLDDdN1BcC8ismyz36AgsWHwalFBUiySxgIwBjKG8PIbg7Ip80vATlPeOhhqrdAkWF
YGJQrM9wM6naA7G0Ji5LhgYY2T3TdUqQd1D9CKPKQOnwg5h+aG14qxQUUSGNhrynE3nvK1klzG/z
hqZq9HbYPiPHB8YX9VbgEn/XceV5JcWRGAN+AovUKNJUs0Fr0c1gz7fOONMh3BK3B0IONfjK1GkP
qDI2Q1w359rnHnGZv8EOkOkr7wlhFZFYCuPiVCs1eVxxyBZqoyaHpF3iL9iMmlu7APGb5xJshReU
4UOlNTKUERBGCatmNgzLWRiIPtNEzeO0XJRbdAG6NdVB1jYxRGQSHlNU1fLXzCpEQtCepDdDkAgF
ODxRBmNUtRT28gdJPEPHxlynXiHOkmogWlkBJqkwUgiXXCkUgnwKQsp1mrzhMusIJSi7zOxqy1HQ
QQJggFo9EoIU2UOoYaSxAaZKRYOoUqRSxoU8AtNaQ1iY2Z+dg2msgczVv20RS+bTEsxRFBCqGc4h
83Y0D7Ss8zvD1l+fHVbUhSGZvGiBT4FACgU1ppmaZM0L7M+pI8sYIItimu1WMzyL1HBYjaDShRaZ
JcchMqfG01Jm4SmwmdqtEB7dzKOE3F0ltoYsJsAJWeKTncy7gMVUkIldFQ5iWvKmZ9/OwYTkVRqQ
uoyGQ6zifoTq2nrfYJ7HgTQh1k69HtCpOBAoVoMojGGCd0sBhAsWPHvFSyXuGsGFCCBwnOFph3bZ
DbUhuuEwaoa0zyzLUS8tYsVYkQygl2jkWJbB3CJsRLWIqS2T6EAKZCcC+thFUVZiXv9PRDVvDy3Z
ht1BDQndjxsCLtHC/nxKZGcs6SorVizZuQxbLoUVALMWah4ksSqoEBkINBASlyZGbokjwEU2wmoh
gwWIYIZnZBYVjWOU11skOPBYsQESDPoj7ekNed9skKMqCaIAWsaG9KLYHJoXTXd0VK0VL32scVxv
g5nRo7tTdm4QumY2Eaoqaoh0YXGCQIAzY4hkZ7uxOhF35DA4TCNRqLBKEFKFKMZmPI0JLMkWHNNs
YLEkX7K4nZXzglo44VTliVpGCDcBOJvDjGB3FHqO/I2CTsMUnEAQKahBDTtaJzErplVACju68rGr
BlJ1QYAKeFshiftRP1B2/FLvg1gWoHNRBPlMxRhr8yWrJBgRGbvPegBqoe65l3A+JEickqB9s1Qr
yOn/6bCdWzMaiH2d3n3GwnnLjkNwZYQjafjwMhl68HLDJRQmRM0BysFgZ4MmCRZPyPZEIeQnkzcm
3MqwPASSRE3ax21QaUCgO0213rtzo4X8ZT04Ls6eBoHt3NEyi6tnxYlhr3VoC0wwcOuPVgmf3zOa
kfMQDQ6swQ+Akh8EPGBBJX3dvuMp1YWrKnl0Qkau9ZKkR80FHOG+sObtG4VRwuROQgHFgcgphSEj
BBDGycj3JGZXREseARzItZ5TkJ4vM+J5PCGW802Qs2PnbSR31VpW26omcTI5TiRkuTF6HB1PnbkM
J/4u5IpwoSH6Im6i

--=-UsKJg3EZU5ZNs2+pyWn2--