Faceted tags
Erich Schubert
erich@debian.org
Wed, 07 Apr 2004 01:14:18 +0200
Hi,
> "Transparent". However, the idea is not to refer to "the specific
> colour", but to the idea of "colour" itself. So, "color" is a facet,
> while "red" is the value of that facet.
What i wanted to say is: there are so many colors. Like there are only a
few items that have only one color. Would you really add the value
"color:red-with-white-stripes-in-spiral-form" for a candy bar?
or if you would add "color:red, color:white, color:pattern:striped,
color:pattern:spirals"? (note the "need" for nesting)
> > ... and here you certainly will have applications which "have to do
> > with" mp3 as well as ogg.
>=20
> Sure. But you can tag that application along the "File format" facet.
> Then you can tag it along the "Use" facet as "play", "record",
> "compress", "convert", "store", "organize"... to say more about how it
> "has to do".
I wasn't referring on the "what you can do" thing, but onto the problem
that you expect to place a package on only one location on an "axis" or
"dimension", but you might need multiple.
How would you classify an cd-ripper?
use:convert:to:mp3, use:convert:to:ogg, use:convert:from:cdda ?
That is the difficulty i'm taling about: making these definitions so
fine-grained that they are fine-grained enough to be useful, but
loosely-grained enough so you will actually bother to enter them...
of course "use:convert:to:mp3" could be made to match "use:*:mp3", but
you really have to keep the amount of data down to the things really
useful. tagging as
use:convert, fileformat:mp3, fileformat:ogg, fileformat:cdda
can be good enough; and probably will be for the next 3 years.
tagging as
use:convert, fileformat:audio
probably is too little information.
That is the work we have to do now, finding out how fine-grained we need
to have people enter the information.
> Yes, I understand that. But as an application framework, it also is a
> user interface toolkit. They'll then be happy to see GNUStep also in
> the Suite facet, and maybe also listed as a "devel::framework".
Again, that is one of the things we have to take care of before starting
tagging; this is what we need the tag task force for.
Of course using "namespaces" (or "facets" if you prefer that name...)
for any tag will reduce such mistakes. That certainly is one thing we
already learned. ;-)
> And here's a very subtle trick: defining the facet, the point of view,
> reduces the waterfall effect in case of tag changes.
Yes, defining the precise meaning of a "namespace" will also define the
meaning of tags in the namespace.
> So, if a package uses HTTP technology, it'll always use the "HTTP
> technology" tag no matter what. I can decide to add a new facet, but o=
n
> the specific "Technology" facet, that package will always have at least
> the HTTP tag.
which actually is a bad-defined tag, because you have http-client
technology, http-server technology, http-proxy technology, probably
-filters, -tunnels etc.
So the "facet" itself is not defining a tag strictly enough IMHO.
And please don't come up with "we can use use:client, use:server" to
differentiate these. While you can easily derive use:client or
net:protocol:http from "net:protocol:http:client", the other direction
just doesn't work (=3D not uniquely determined) as soon as you have a lot
of tags - and you intend to have a lot.
Tagging a package as "webserver" is okay for me, because this is way
more precise than "net:server" "net:protocol:http" (which apply to a
transparent proxy as well)
> Facets define a well-specified context for a property: the property is
> that, and that alone. Tags in facets become as atomic as possible:
If you want to go for atomic facets you have to store the connections
between them, too.
> Yes, we did that already with namespaces. Basically, transitioning to
> facets boils down to just mandating namespaces.
Which i certainly will support. ;-)
> There is a difference between using facets and implications, though: if
> you have "http" implying "net", then you have a "net" tag which means
> too much, or almost nothing. Facets don't allow you to aver over-broad
> tags (which, by the way, are the big refactoring headaches).
Ignoring over-broad tags when appropriate has proven easy; having such
tags helps keeping the hierarchy balanced.
Think of the file formats: if you provide the user a list of like 100
file formats he'll get lost. Having them grouped into video, audio, text
etc. helps a lot. Having a DAG is even better than a tree (ogg can be
video or audio, as you already mentioned; avi, asf, quicktime are other
such encapsulation formats)
Which hierarchy levels are to be shown and which not is a thing the user
interface should (and can) decide.
> Package identities are captured not by creating specific tags, but by
> interference. If a new package comes out which is difficult to
> categorize, then it may be an hint for a new facet/point of view from
> which to look at packages, holding the existing ones unchanged.=20
Well, would you can't interfere "webserver" from net:protocol:http and
use:server - it could be a proxy server, too.
> It's this last grouping that creates problems, IMO: "ogg" is an audio
> file format, but also a video file format. It really is a multimedia
> container which is commonly used to carry "vorbis" audio data, and can
> carry data encoded with other codecs.
This is not a problem at all, since we are in a DAG. fileformat::ogg can
imply both fileformat::video and fileformat::audio.
> Instead of defining "audio file formats", you define "file format" and
> you define "media" and "technology". Then you have file-format::ogg,
> media::audio and codec::vorbis. If the OGG people will decide to put
> images or ELF objects inside OGG files, then all the existing facets
> will still be valid, and we'll start having applications tagged with
> "file-format::ogg", "media::raster-image" and "codec::jpg", and maybe
> "file-format::ogg", "devel::linker".
Ok, using media::audio and codec::vorbis certainly is a more precise way
of splitting this information. Unfortunately this make tagging more
complex. Still, having codec::vorbis imply media::audio (for example)
makes sense IMHO.
My tag browser would also be happy with fileformat::ogg etc. and
media::audio - inside fileformat:: it will not show you a list of file
formats, but instead suggest media::something (since these groups most
probably will be better balanced)
> By categorizing with interference, we support a huge amount of cases we
> haven't thought of from the beginning. I see this as extremely
> important, as I definitely want to assume that we can't be able to thin=
k
> of everything from the beginning.
I don't think you can do proper interference from these atomic tags
alone. In fact you can think of facet::value as a rule similar to
"facet -> value" ("when i care for the facet i obtain value")
This probably shows the need for more logic in there.
I think if you go for full first order logic you'll make the system too
complex to be fast enough for real use.
A month ago a new project was started being coordinated by my institute.
Rewerse.net, "Reasoning on the web". There is also a couple of italian
universities involved (it's an EU project)
Seems like we're becoming more and more related to that. ;-)
(which is cool, because i could do my diploma thesis in that... but i
don't think i'm really going to do that.)
Greetings,
Erich Schubert
--=20
erich@(vitavonni.de|debian.org) -- GPG Key ID: 4B3A135C (o_
Go away or i'll replace you with a very small shell script. //\
Die k=C3=BCrzeste Verbindung zwischen zwei Menschen ist ein L=C3=A4che=
ln. V_/_