A couple of questions

Enrico Zini enrico at enricozini.org
Sun Dec 3 21:47:10 CET 2006


Hi Frank, welcome back!

On Sun, Dec 03, 2006 at 06:53:36PM +0100, Frank Lichtenheld wrote:

> 1) I currently use http://debtags.alioth.debian.org/tags/vocabulary.gz
>    to assign names and descriptions to tags. Is this the canonical
>    method to do that or is there a better way?

That's the list that is used for pretty much everything.

Alternatives are:
 - get it directly from svn:
   svn://svn.debian.org/debtags/vocabulary/trunk/debian-packages
 - get the data for Javascript:
   http://debtags.alioth.debian.org/js/vocabulary.js

> 1a) What is the difference between vocabulary.gz and vocabularu.gz?

*Gasp!* A typo in the daily maintenance script.  Thank you so much for
spotting it!  I fixed it.
Now I deleted vocabularu.gz and vocabulary.gz gets updated as it should.


> 2) I would like to try to offer something like "debtags related" on
>    the packages sites. Doing this on the fly is way too slow though.
>    Would it perhaps make sense to create a central database with
>    this kind of information?

I can implement it in debtagsd, then you can access it on the fly and
hopefully fast.

The problem is, it's hard to define when two packages are related by
looking at tags, without allowing the user to tweak the distance.

Or, one can auto-increase the distance until one gets a decent number of
matches, but that is expensive (it repeats the search multiple times).

Or, one can use Xapian
http://lists.alioth.debian.org/pipermail/debtags-devel/2006-November/001446.html
[I still need to get back to that mail, sorry Olly]

Currently, I don't feed the tags to xapian but only the descriptions.
You can however set up a xapian database for packages.debian.org and
also add the descriptions in it: that way you get for free a full text
search which can also tell you "similar packages" and does it by also
taking tag data into account.


> 3) I find the smart search interesting as a concept, but I quite frankly
>    don't understand most of the time what it does and why. E.g.:
>    3.1) What exactly does "Wanted" and "Available" mean for the tags?
>         Or better: How are the initial "Wanted" tags computed?

The system uses the keywords to compute a list of tags: first I do a
full text search on the package descriptions, then I look at how the
statistical distribution of tags changes in those search results.

The result of this is a list of tags sorted by "relevance" to your
keywords: the first tag is the one more related to the keywords and so
on.  This is the initial list of 'available' tags.

From the list of 'available' tags you can choose the ones that you
'want': those that represent what you are looking for.

The ssearch.html page does an extra step and selects some of the most
relevant tags for you.  It does it by continuously clicking on the most
relevant 'available' tag and stops when the result set becomes too
short.

>    3.2) What does the initial keyword search search for exactly? E.g. why
>         doesn't find "gcc" the libgccX packages, or why does "apache"
> 	find only apache modules but not apache itself?

gcc doesn't show the 'libgccX' packages because devel::compiler gets
autoselected and libgcc1 is (rightfully) not tagged as devel::compiler.

apache itself doesn't show because role::plugin gets autoselected and
apache is not a plugin.

You can click on the tags you don't want to remove them from the
'wanted' set and broaden the selection.


>    3.3) The page really should have a "Reset" button to be able to
>         search above all packages again. One can do this by a "hard"
> 	reload, too, but it's not exactly intuitive...

Added!

>    3.4) Bug: Broadening the search by removing tags again doesn't seem
>         work at all

It seems to work now.  It was buggy, then I reload the page to see of
the "Reset" button change made its way to the site, and broadening the
tags started to work.


Ciao,

Enrico

-- 
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico at debian.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debtags-devel/attachments/20061203/2b6a8ca6/attachment-0001.pgp


More information about the Debtags-devel mailing list