Faceted tags

Erich Schubert erich@debian.org
Tue, 06 Apr 2004 23:03:05 +0200


Hi Enrico,

>  - they are semantically invariant (for example, the property "colour"
>    can assume many different values, but it's an invariant concept: an
>    object will always have a colour)

IMHO this is a bad example. Objects might have multiple colours, and
which colour would you assign to glass?

> I'd thus define "facet" as the "dimension", or "axis" of categorization
> and "tag" as the value along the dimension/axis.

If you are talking about a dimension or axis you expect things to have
exactly one value upon the axis; maybe an interval. You expect an axis
to be ordered, too.

> So, in our debtags domain, possible facets and relative tags can be:
>  - Supported file formats
>    (MP3, OGG, PDF...)

... and here you certainly will have applications which "have to do
with" mp3 as well as ogg.

I think we should consider each tag to be an axis itself. Most will be
binary, either applys or doesn't apply. There a few cases where this is
different, for example "maturity", and maybe "freedom-of-use".

>  - User interface toolkit
>    (GTK, QT, GNUStep...)

Be careful with that one; someone was very unhappy with us putting
GNUStep in the UI/DE section; its an application framework. ;-)

I think the term "namespaces" is better for the outer terms; facets for
the items is okay.

> For example, now we know that a first set of facets and tags can be
> defined now, and that facets can be added and expanded later.  So, there
> is no need to define a special proper set of tags.  It'd be interesting,
> instead, to make a good work to define a good initial set of facets to
> start working with.

Actually that is one of the things I learned from my experients: adding
tags later on does work, but you would need to re-tag lots of
applications. Adding tags should be avoided; if possible it should be
done in a batch job so you can actually re-tag everything.
Changing tags is even more of a hassle, so IMHO we really should spend a
lot of time on writing a proper tag set that contains like 99% of the
tags we are going to have at the end.
Removing tags is _way_ easier.

> And here we know that if GNUStep is something inbetween of a widget
> toolkit and a desktop environment, then it should be categorized along
> at least two facets/dimensions/axis: "Widget Toolkit" and "Desktop
> Environment".

GNUStep is not a package. We categorize packages.
It means or "GNUStep" tag has to be split and renamed.

> I find that grouping tags in facets/dimensions/axes makes the
> catalogation work much easier, because the meaning of each tag is
> much, much more clear, having the context attached.

Yeah, i started to do this with my namespaces; instead of putting them
into "dimensions" or "axes" i had put them into a tree (actually a
network, a DAG) hierarchy using implications. This is more flexible.

> I plan to write some special support for facets in libtagcoll and the
> various related applications, as for example one may want to query for
> administration::* or for *::html, or to list all tags in a given facet.
> But yes, all existing tools already work great today!

You don't need this kind of special handling if you use implications.
administration::something should imply "top-level" administration,
whereas "file-format::html" should imply "html".
Using implications also allows to group for example html, txt, tex as
"text file formats" and mp3, ogg, wav as "audio file formats".

> Since the meaning of tags can now be much clearer, we could definitely
> think about trying to let maintainers tag their packageg.  I start
> seeing no need for a tag task force doing everything, but maybe some tag
> consultants that give advice and maybe override overstupid things.

I think we need to properly define for many tags when they are to be
used and when not. We also should consider making a difference between
"can read mp3" and "can modify mp3", for example.

Greetings,
Erich Schubert
-- 
     erich@(vitavonni.de|debian.org)    --    GPG Key ID: 4B3A135C    (o_
   A polar bear is a rectangular bear after a coordinate transform.   //\
  In unseren Freunden suchen wir, was uns fehlt. --- Thornton Wilder  V_/_