[Debtags-devel] New [related] link in search.cgi

Torsten Marek shlomme at gmx.net
Fri Nov 18 20:14:35 UTC 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Enrico Zini schrieb:
> Hello,
> 
> On private mail one visitor suggested me to add a [related] link to the
> package list that would start a new search with the package name.
> 
> What he was doing was searching the package name instead of keywords;
> that would indeed match the package at least, bring the package's tags
> in the 'Wanted' taglist and show other packages with same tags.
> 
> I hadn't realized that this would work, but it seems that it would!
> 
> It would be interesting, in that case, to score the tags by 'relevance'.
> I think TFIDF[1] could be used to say how relevant is a tag for a package
> compared to the rest of the collection.  Anyone would like to
> experiment?

Hi Enrico,

my knowledge of tf.idf is limited, but what would you consider as term frequency
here? The count of a tag is always for a given package is always 1, and you're
left with idf only.
Still, it would be useful, the relevance of a tag some::tag for a package (that
has been tagged with it, that is) would then be
1 / docfrequency(some::tag)
or, at your option, log(df).

What would be possible to do is to calculate the relevance of a tag for a given
search keyword, instead of finding just the tags with the highest frequency,
which is computationally more expensive, since it involves computing
P(tag|keyword), P(!tag|keyword), P(tag|!keyword) and P(!tag|!keyword). I'm
neither a master at information retrieval nor with statistics, but I've done
some study-related work in that direction lately.
Is that somehow along your lines of thought?

greetings

Torsten

- --
Torsten Marek <shlomme at gmx.net>
ID: A244C858 -- FP: 1902 0002 5DFC 856B F146  894C 7CC5 451E A244 C858
Keyserver: subkeys.pgp.net

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDfjYrfMVFHqJEyFgRAsUUAJ47kWNGeCf69QjckxrUj/arGmslpACfdgsT
xQKNtJGIiE3DgOYm1DdVrDE=
=eaAb
-----END PGP SIGNATURE-----



More information about the Debtags-devel mailing list