Faceted tags

Wed, 7 Apr 2004 19:37:18 +0100

--wac7ysb48OaltWcw
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Apr 07, 2004 at 01:14:18AM +0200, Erich Schubert wrote:

> > "Transparent".  However, the idea is not to refer to "the specific
> > colour", but to the idea of "colour" itself.  So, "color" is a facet,
> > while "red" is the value of that facet.
> What i wanted to say is: there are so many colors. Like there are only a
> few items that have only one color. Would you really add the value
> "color:red-with-white-stripes-in-spiral-form" for a candy bar?
> or if you would add "color:red, color:white, color:pattern:striped,
> color:pattern:spirals"? (note the "need" for nesting)

I'd say color::red, color::white and forget about the pattern.  When
we'll feel the need to categorize patterns, we could decide to create a
"pattern" facet with "spirals" or "striped" tags.

I don't necessarily care about categorizing everything right from the
start: I want to start having a useful categorization now, and possibly
one which is easy to extend.

> I wasn't referring on the "what you can do" thing, but onto the problem
> that you expect to place a package on only one location on an "axis" or
> "dimension", but you might need multiple.
> How would you classify an cd-ripper?
> use:convert:to:mp3, use:convert:to:ogg, use:convert:from:cdda ?

Like before, I doubt I'll want to be that precise.  Doing use::convert,
tech::mp3, tech::ogg, tech::vorbis, tech::cdda, hardware::cd is
definitely a great step.  Then users could look for converters, select
their favourite encoding and see from the description which is the
direction of the conversion (or maybe find a distincion between "ripper"
and "burner" in anoter facet).

Once you can narrow down the list of packages to a short list, then it
doesn't take much to work the descriptions out.  And my personal goal is
narrowing, not finding the exact precise task.

If in the future we want to support some automatic foobarization of
converters, we could even come up with concepts like tag predicates such
as "converts(tagset source, tagset dest)".

The possibility is open, but we don't need it today.  The cool thing is
that the data will be there to implement it tomorrow.

> useful. tagging as
>   use:convert, fileformat:mp3, fileformat:ogg, fileformat:cdda
> can be good enough; and probably will be for the next 3 years.
> tagging as
>   use:convert, fileformat:audio
> probably is too little information.
>=20
> That is the work we have to do now, finding out how fine-grained we need
> to have people enter the information.

Super-super-agree!

As a rule of thumb, I'd set some rule to follow:

 1) Identify the aspects (facets) which make packages different from
    each other
 2) Create tags to describe the instances of those aspects, making them
    atomic and consistent, so that they can be reused
 3) If huge (>20) numbers of fully-tagged packages still end up with the
    same tagset, look at them, find out what differentiates them and
    create (a) new facet(s)

And also a:
 1a) Identify the most important goals of users searching the archive,
     and see if there are aspects (facets) of packages which can
     differentiate them in a way which is useful for attaining those
     goals

> Again, that is one of the things we have to take care of before starting
> tagging; this is what we need the tag task force for.
> Of course using "namespaces" (or "facets" if you prefer that name...)
> for any tag will reduce such mistakes. That certainly is one thing we
> already learned. ;-)
> > And here's a very subtle trick: defining the facet, the point of view,
> > reduces the waterfall effect in case of tag changes.
> Yes, defining the precise meaning of a "namespace" will also define the
> meaning of tags in the namespace.

We agree here: great!  So we can start working out namespace/facet
definitions.

As for the "namespace" or "facet" name, I tend to like "facet" now
because it connects us with a lot of existing Library Science and
Information Architecture literature, and potentially network us with a
lot of work already done/being done.

Last but not least, "facet" gives us a fascinating Indian ancestor: :)

  http://w3.uniroma1.it/vrd/mathematics/i-ranganathan.html

And 5 beautiful laws of library science:

  http://www.mcallen.lib.tx.us/library/ranganat.htm

Which could be reformulated as:

	* The Five Laws of Package Categorization *

	  1. Packages are for use.
	  2. Every user his package.
	  3. Every package his user.
	  4. Save the time of the user.
	  5. Debian is a growing organism.=20

And looks tremendously cool! :)

Actually, after writing this I found out that someone already did this
association:

  http://www.kmentor.com/socio-tech-info/archives/000079.html

> > So, if a package uses HTTP technology, it'll always use the "HTTP
> > technology" tag no matter what.  I can decide to add a new facet, but on
> > the specific "Technology" facet, that package will always have at least
> > the HTTP tag.
>=20
> which actually is a bad-defined tag, because you have http-client
> technology, http-server technology, http-proxy technology, probably
> -filters, -tunnels etc.
> So the "facet" itself is not defining a tag strictly enough IMHO.
> And please don't come up with "we can use use:client, use:server" to
> differentiate these. While you can easily derive use:client or
> net:protocol:http from "net:protocol:http:client", the other direction
> just doesn't work (=3D not uniquely determined) as soon as you have a lot
> of tags - and you intend to have a lot.
> Tagging a package as "webserver" is okay for me, because this is way
> more precise than "net:server" "net:protocol:http" (which apply to a
> transparent proxy as well)

You are not contradicting the example I did: you are adding a new facet.
And you can see that when you do that, the tech::http remains valid.

Actually, you can find a "web" facet in the database I posted yesterday,
under which there should be a "server" tag.

So, if you're looking for something that talks HTTP, you start with the
Tech facet, and then you can decide to further differentiate with
role::server or role::client or role::proxy.  If querying for tech::http
and role::server gave, say, 7 matches, we wouldn't even need the web::
facet, as the grain would be fine enough.  If the need to look for
web-specific terminology arises, you can create the web:: facet at any
time, with web::server, web::browser, web::proxy and everything, and the
tech::facet is still valid, and actually a good reference to use when
populating "web::".

> > Facets define a well-specified context for a property: the property is
> > that, and that alone.  Tags in facets become as atomic as possible:
> If you want to go for atomic facets you have to store the connections
> between them, too.

That could always be done later, maybe 2 or 3 years from now.  I don't
think we need it now, though.

> > Yes, we did that already with namespaces.  Basically, transitioning to
> > facets boils down to just mandating namespaces.
> Which i certainly will support. ;-)

Supercool!!  That's a thing we can definitely work together already now,
then.

In the two files I sent to the list there is a list of facets/namespaces
populated with tags.  Some of them I like, some of them not.

I definitely like:
  devel:: (supports the goal of development)
  langdevel:: (supports looking for tools specific to a given language)
  implemented-in:: (supports looking for existing code to reuse)
  media:: (supports looking for software working with a media of interest)=
=20
  tech:: (supports looking for specific technology)
  suite:: (allows to differentiate among all the element that compose
          big pieces of software)
  interface:: (allows to choose the interaction method)
    (actually, I don't think interface::3d belongs there, and I think
    that that "3d" should be defined better.  Maybe by creating specific
    tags under "tech::", like "tech::vrml", "tech::opengl"...)
  uitoolkit:: (allows to look for similarly-looking applications and
               code to reuse)
  culture:: (it gives you things specific to your identity)
  x11:: (nice, clean distinction from the x11 point of view)
  use:: (supports goal-oriented searches, and see also
         http://segusoland.sourceforge.net/ for possible further uses)
  web:: (web is a world by itself, and here's its point of view)
  admin:: (system administration is another world by itself)
  field:: (here we go into Library Science, and we could borrow some tag
           set from there.  It's very cool to cathecorize also the
	   various books and other free data we started packaging:
	   "anarchism" could go in field::politics, for example)
  game:: (gaming is indeed another world by itself)
  hardware:: (it's great to be able to see things for the point of view
              of something you can really touch!)
=2E..and many others.

I don't like so far:
  role: I lack the cognitive structure to find out the name for
	the point of view under which an application is a client or a
	server.  "daemon" is another thing so far I can't assign a point
	of view (maybe "interface", as a daemon is something that has no
	interface?  Uhm...  it could be)
  data: the intent was to categorize non-software content, as Debian now
        contains manuals, books, artwork, and this part could be
	expanded (I plan to package a kitchen recipe collection, for
	example).  However, the effort so far didn't go very far and
	needs more thinking, de-constructing and clarification of ideas
	:)
  platform: just "laptop" and "embedded"?  Uhm... it's a point of view
        from which I can't see very much, but a point of view I see
	a reason for
  special: it's more like a kind of a catch-all, but it's probably not
        too bad

In the file facets.gz I marked a conversion from existing tags to tags
in this faceted framework.  In faceted_db.gz there is at least 2 days
worth of conversion work.  If there are some facets/namespaces you like,
you could integrate them in your database as well and see what happens.

Actually, I've tried faceted_db.gz in tagcolledit and found out that
the archive navigation gets very neat.

> Think of the file formats: if you provide the user a list of like 100
> file formats he'll get lost. Having them grouped into video, audio, text
> etc. helps a lot. Having a DAG is even better than a tree (ogg can be
> video or audio, as you already mentioned; avi, asf, quicktime are other
> such encapsulation formats)
>=20
> Which hierarchy levels are to be shown and which not is a thing the user
> interface should (and can) decide.

This is a problem I recognize, too.  I see it only arising inside facets
with more than 10/15 tags, and more like a problem of how to organize
tags inside a facet.

Narrowing it down to be facet-specific, it could maybe be solved at the
moment by creating hand-crafted, hardcoded, redundant hierarchies to use
when presenting the tag list to users.

It's an easy start, and then we could see what happens.  The existing
tagcoll utilities can be used to create hierarchies from existing
taggings, too.

> > Package identities are captured not by creating specific tags, but by
> > interference.  If a new package comes out which is difficult to
> > categorize, then it may be an hint for a new facet/point of view from
> > which to look at packages, holding the existing ones unchanged.=20
> Well, would you can't interfere "webserver" from net:protocol:http and
> use:server - it could be a proxy server, too.

Sure, but you get nearer.  You exclude web browsers, for example.
Other facets can create more interference to get you even further.

> My tag browser would also be happy with fileformat::ogg etc. and
> media::audio - inside fileformat:: it will not show you a list of file
> formats, but instead suggest media::something (since these groups most
> probably will be better balanced)

I can't see how thinking facets wouldn't give that, too.

> > By categorizing with interference, we support a huge amount of cases we
> > haven't thought of from the beginning.  I see this as extremely
> > important, as I definitely want to assume that we can't be able to think
> > of everything from the beginning.
> I don't think you can do proper interference from these atomic tags
> alone. In fact you can think of facet::value as a rule similar to
> "facet -> value" ("when i care for the facet i obtain value")
> This probably shows the need for more logic in there.
> I think if you go for full first order logic you'll make the system too
> complex to be fast enough for real use.

I'd say let's catch complexity as needs arise.  And just by mandating
namespaces/facets and defining them as the facets literature suggests we
can catch enough complexity to keep going for at least 2 or 3 years, and
without even changing all the tools we already built.

> A month ago a new project was started being coordinated by my institute.
> Rewerse.net, "Reasoning on the web". There is also a couple of italian
> universities involved (it's an EU project)
> Seems like we're becoming more and more related to that. ;-)
> (which is cool, because i could do my diploma thesis in that... but i
> don't think i'm really going to do that.)

Actually, I just graduated, and if by the end of the summer I still
don't know what to do next... why not? :)

Is there the University of Urbino involved in that?  I know they are
planning to start an Information Architecture course there, which is
exactly the topic we're moving in here.

Ciao,

Enrico

--
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico@debian.org>

--wac7ysb48OaltWcw
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAdEpe9LSwzHl+v6sRArSUAJ9fJfpUOQCCKKQ4wa0qTaxOprUfeACfemqE
QB1wgsKNtVlypbzavQ9Kblo=
=2q/k
-----END PGP SIGNATURE-----

--wac7ysb48OaltWcw--