New tags for biology and medicine.

Thu Sep 6 20:21:57 UTC 2007

Hello,

On Thu, 2007-09-06 at 20:30 +0900, Charles Plessy wrote:
> Dear all,
> 
> I was a bit lazily waiting for the conversation to settle before trying
> to aswer :)
> 
> 
> > +Tag: field::biology:bioinformatics
> > +Tag: field::biology:molecular
> > +Tag: field::biology:structural
> > 
> > This is probably a reasonable distinction, though we have to decide if
> > we want such a fine-grained separation of the "field" facet. We would
> > also end up with needing the same level of detail for electronics,
> > chemistry, physics,...
> 
> I think that I would have a pragmatic approach : fine-graining as long
> as there is a consensual demand. By this I mean that fine-graing a facet
> should not become a hassle for the package maintainers who are not
> interested in them. In the case of the Debian-Med project, I think that
> each time we will propose such kind of tags it will mean that we have
> people dedicated to screen all the parent tags and assign the
> fine-grained if necessary.

There are two more things to consider:
     1. the users who do searching based on tags and
     2. the people doing the tagging.
With each tag, the complexity of the vocabulary will be increased and
only a small percentage of the people mentioned above is interested in
the level of detail provided by the med-specific tags. However, they
have to deal with those tags either way. To reduce the burden of those
people it is, that I proposed to keep the tags in a separate facet. It
might even make things easier for med-interested people, because they
would probably recognise the biology:: facet as an important one and go
straight there to look for interesting tags.

> (by the way, could there be a subscription mechanism to monitor addition
> and removal of tags ?)

I believe the best thing right now is an SVN diff, which could
theoretically be hooked into sending an email upon changes. However, no
such thing is currently implemented (I believe).

> > +Tag: works-with::sequence
> > +Description: Sequence
> > + The program manipulates data made of a sequence of elements from a
> > finite set.
> > 
> > Somehow this is different to the current tags in works-with, but I
> > believe it could fit in. E.g. sorting applications could also fit in
> > here?
> 
> I think that this is exactly the goal. Sometimes there is innovative
> research which is done by taking tools for analysing genome sequence and
> utilizing them on written language, or vice-versa. I would see this tag
> with a high level of abstraction.

Are there packages out there, which work on general seqences (i.e. are
independent of the type of the sequence). The utility "sort" comes to my
mind, wich can work on many different types (numbers, strings, dates)
What you describe is obviously a nice idea, but I think beyond the scope
of debtags. A package for DNA analysis will probably not work when
feeded with written language (without modification). And debtags is
about describing what a package can do as it is.

> > +Tag: works-with::sequence:nuceleic
> > +Description: Nucleic acids
> > + Sequence of nucleic acids: DNA, RNA but also non-natural nucleic acids
> > such as PNA or LNA.
> > +
> > +Tag: works-with::sequence:peptidic
> > +Description: Proteins
> > + Sequence of aminoacids: peptides and proteins.
> > 
> > Quite detailed, though otherwise, people proably won't pick
> > works-with::sequence if searching for algorithms working on a DNA.
> 
> I made this proposition with the goal of having a lot of debian-med
> packages which manipulate sequence. In that context, the biologist would
> naturally want to distinguish between proteins and nucleic acids: this
> is a very common distinction. But shall we wait before we have, say 50
> packages wihich have field::biology and works-with::sequence?

I have suggested to move those into the biology:: facet, so you get full
expressivity without bloating the works-with:: facet.

> > +Tag: works-with-format::plaintext:aln
> > +Tag: works-with-format::plaintext:fasta
> > +Tag: works-with-format::plaintext:nexus
> 
> This is definitely an area where there is an overlap between mime types
> and tags. But I would definitely be excited if debtags could propose
> toolchains which are connected by the formats they accept. Once again,
> we do not have the critical mass yet...

Same here, I proposed to put them into biology::.

> A few words of the proposals you made in another mail:
> 
> > * ::bioinformatics, ::molecular-biology, ::structural-biology
> I would rather see field::biology:molecular than
> field::biology:molecular-biology, 

Sure.
However, my proposal was to have biology::molecular-biology. Though, you
seem to prefer to keep this in the main field facet, which is also ok.

> biology::molecular-biology:structural instead of
> biology::structural(-biology) may horrify some of our colleagues, though.

I think you have misread my proposal here. Or I am misunderstand you.
What would horrify your colleagues?

> >       * ::emboss
> I strongly advocate suite::emboss we will get the critical mass for it.

Again I would move that into a biology facet.

> In conclusion, about the possiblity to manage ourselves our sets of
> tags. In the everyday work, one has a very narrow point of view of his
> tools. I use a PCR machine to "make a PCR", I use a Pipetmanⓡ to
> "pipette",... this could be expressed by biology::PCR, and
> biology::pipetting. But if we think harder, we can have a higher point
> of view. Instead of biology::PCR it would be use::amplification, or
> use::diagnostic, for instance, because the PCR machine produces DNA, but
> sometimes we want to keep it as a reagent, and some other times we just
> want to see its size and then we throw it away.
> 
> So the questions I am wondering about are :
> 
>  - What is the most powerful approach ?
>  - What is the expectations of our users ?
>  - How can we interest our users in an unexpeced and powerful usage of
>    the DebTags ?

We had the dicussion of the degree of detail for the vocabulary (which
is the set of facets and tags) before, and most agree that a high degree
is desirable. The complexity of a larger number of tags can be made
manageable by a good user interface. However, I think this applies only
for the "general purpose" domain (i.e. search criteria required by the
majority of users). The other special purpose domains like (devel,
security, medicine,..) IMO should be provided in seperate modules of
encapsulation (if you forgive me using this term from the software
enginnering terminology) - which in this case can be represented as
seperate facets. Within those facets a high degree of detail can be
achieved again.

> I think that an advanced usage of Debtags is the only way to bring
> attention of users and ourselves to programs which we do not expect to
> be relevant to their fields. This is why I am pushing a bit for more
> fine-grained tags in mutliple official facets, rather than a private
> biology:: facet in which we will reproduce the idiosyncrasies of our
> disciplines...

You do have a point here. However, I think the arguments in favour of a
separate facet are stronger. Since the debtags team is in favour of this
approach and Andreas and Steffen seem to be fine with it too, I would
like to proceed that way. I hope you can also live with that.

Regards Ben

P.S. the art of finding a consensus is a very difficult one, perhaps we
should write some software to aid us with that.
Oh, I know one, it is called rand() ;-)