[Debtags-devel] Re: Recent progress

Tue, 8 Mar 2005 13:15:00 +0100

On Mon, Mar 07, 2005 at 08:07:50PM +0100, Benjamin Mesing wrote:

> > > Searching a package can be split into two steps, first searching th=
e
> > > correct tags and second searching the packages in the result.

> > Yes, but while it may be desirable to show the two steps explicitely,
> > they can be transparently combined.

> And how would you do that??

Here is what comes to my mind without thinking too much about it.

The user types keywords. First, these keywords are associated to tags.
If a keyword _is_ a tag name, give a maximum score to this tag.
If it's a tag synonym, give it a slightly inferior score.
Now look at the tag descriptions, and deduce possible tags from the
keyword, giving the inferred tags an inferior score.
Compute the list of packages with these tags, and give a score to
the packages, with a score per package that reflects the score
of its tags. You now have a list of packages, sorted by score
(possible adequacy to the keyword list, nothing will ever be perfect).

You also can combine this with a full text search in package descriptions
with the same scoring strategy, and combine the results.

> > > So in the end it comes down to what we consider to be more importan=
t -
> > > exact search results or a vocabulary with a low complexity reducing=
 the
> > > complexity of the first step and the tagging process.

> > > You vote for the first, me for the latter and I don't know who is
> > > right :-(

> > :-)
> > I think things must be as complex as they can be for people who can
> > afford dealing with power (backend), and as simple as possible for pe=
ople
> > who don't care (UI). That's the UNIX philosophy, isn't it?

> Ok, you're right again. You were able to resolve my doubts regarding th=
e
> end user. 
> However there is the issue of tagging. The more complex the database,
> the more likely it will be that maintainers do not tag their packages
> correctly.

Not if a UI helps them.

> I know you will say that this is a UI issue :-)

Oops, see above. :-)

> But I think
> we have to live with the fact that very different UIs will be around
> (web, GUI Application, console tools,...) and some will not do a very
> good job.

While it would be good to have plenty of tools for browsing, I think
it would be an error to let tag maintainers use different possibly
inefficient or inconsistent tools for tag editions/additions.
The best approach would be a Web interface.  While other UIs could
perfectly be imagined, it would be a pity to scatter efforts in the
first place.

> So the data might be bad - which makes the whole system look
> bad.

We cannot make the data look good by magic. All we can do it to make
the maintainance process easy with a good UI and keep track of changes
so that we know the state of each package (whatever good or bad it
may be).

> -- But when thinking about it (sorry for thinking out loud :-) this
> might be solved by the community - like Brian suggested with a wiki
> approach

I really don't like it for this purpose, as it is not automated.

> or with the simple mechanism of reportbug.

Reportbug to report incorrect tagging?  Why not, but it would be
cleverer to allow people to directly edit tags and correct things
right away. As a refinement, we could imagine a moderation process (by
the tag team and the package maintainers).

> So finally there is not much I can say against a complex vocabulary any
> more... But please don't let it become to frayed :-))

I think there's not much to worry about if we do things right.

 Hervé

-- 
 _
(°=  Hervé Eychenne
//)  Homepage:          http://www.eychenne.org/
v_/_ WallFire project:  http://www.wallfire.org/