New tags for biology and medicine.

Rudi Cilibrasi cilibrar at
Thu Sep 6 21:03:49 UTC 2007

On 9/6/07, Benjamin Mesing <bensmail at> wrote:
> Hello,
> Are there packages out there, which work on general seqences (i.e. are
> independent of the type of the sequence). The utility "sort" comes to my
> mind, wich can work on many different types (numbers, strings, dates)
> What you describe is obviously a nice idea, but I think beyond the scope
> of debtags. A package for DNA analysis will probably not work when
> feeded with written language (without modification). And debtags is
> about describing what a package can do as it is.

The complearn-tools package is one example; it works well with genetic
sequences, protein sequences, written human languages, compiled
executable code, and many other domains.  It is in the class of
algorithms called "universal learners" which have recently gained
popularity. [1,2]  This terminology is not without support given
recent results in universal type theory. [3]

Best regards,


[2]:  R. Cilibrasi, P.M.B. Vitanyi, Clustering by compression, IEEE
Trans. Information Theory, 51:4(2005)
[3]: Seroussi, Gadiel, On universal types, HPL-2004-153 20040917, HP Tech report

> > > +Tag: works-with::sequence:nuceleic
> > > +Description: Nucleic acids
> > > + Sequence of nucleic acids: DNA, RNA but also non-natural nucleic acids
> > > such as PNA or LNA.
> > > +
> > > +Tag: works-with::sequence:peptidic
> > > +Description: Proteins
> > > + Sequence of aminoacids: peptides and proteins.
> > >
> > > Quite detailed, though otherwise, people proably won't pick
> > > works-with::sequence if searching for algorithms working on a DNA.
> >
> > I made this proposition with the goal of having a lot of debian-med
> > packages which manipulate sequence. In that context, the biologist would
> > naturally want to distinguish between proteins and nucleic acids: this
> > is a very common distinction. But shall we wait before we have, say 50
> > packages wihich have field::biology and works-with::sequence?
> I have suggested to move those into the biology:: facet, so you get full
> expressivity without bloating the works-with:: facet.
> > > +Tag: works-with-format::plaintext:aln
> > > +Tag: works-with-format::plaintext:fasta
> > > +Tag: works-with-format::plaintext:nexus
> >
> > This is definitely an area where there is an overlap between mime types
> > and tags. But I would definitely be excited if debtags could propose
> > toolchains which are connected by the formats they accept. Once again,
> > we do not have the critical mass yet...
> Same here, I proposed to put them into biology::.
> > A few words of the proposals you made in another mail:
> >
> > > * ::bioinformatics, ::molecular-biology, ::structural-biology
> > I would rather see field::biology:molecular than
> > field::biology:molecular-biology,
> Sure.
> However, my proposal was to have biology::molecular-biology. Though, you
> seem to prefer to keep this in the main field facet, which is also ok.
> > biology::molecular-biology:structural instead of
> > biology::structural(-biology) may horrify some of our colleagues, though.
> I think you have misread my proposal here. Or I am misunderstand you.
> What would horrify your colleagues?
> > >       * ::emboss
> > I strongly advocate suite::emboss we will get the critical mass for it.
> Again I would move that into a biology facet.
> > In conclusion, about the possiblity to manage ourselves our sets of
> > tags. In the everyday work, one has a very narrow point of view of his
> > tools. I use a PCR machine to "make a PCR", I use a Pipetmanⓡ to
> > "pipette",... this could be expressed by biology::PCR, and
> > biology::pipetting. But if we think harder, we can have a higher point
> > of view. Instead of biology::PCR it would be use::amplification, or
> > use::diagnostic, for instance, because the PCR machine produces DNA, but
> > sometimes we want to keep it as a reagent, and some other times we just
> > want to see its size and then we throw it away.
> >
> > So the questions I am wondering about are :
> >
> >  - What is the most powerful approach ?
> >  - What is the expectations of our users ?
> >  - How can we interest our users in an unexpeced and powerful usage of
> >    the DebTags ?
> We had the dicussion of the degree of detail for the vocabulary (which
> is the set of facets and tags) before, and most agree that a high degree
> is desirable. The complexity of a larger number of tags can be made
> manageable by a good user interface. However, I think this applies only
> for the "general purpose" domain (i.e. search criteria required by the
> majority of users). The other special purpose domains like (devel,
> security, medicine,..) IMO should be provided in seperate modules of
> encapsulation (if you forgive me using this term from the software
> enginnering terminology) - which in this case can be represented as
> seperate facets. Within those facets a high degree of detail can be
> achieved again.
> > I think that an advanced usage of Debtags is the only way to bring
> > attention of users and ourselves to programs which we do not expect to
> > be relevant to their fields. This is why I am pushing a bit for more
> > fine-grained tags in mutliple official facets, rather than a private
> > biology:: facet in which we will reproduce the idiosyncrasies of our
> > disciplines...
> You do have a point here. However, I think the arguments in favour of a
> separate facet are stronger. Since the debtags team is in favour of this
> approach and Andreas and Steffen seem to be fine with it too, I would
> like to proceed that way. I hope you can also live with that.
> Regards Ben
> P.S. the art of finding a consensus is a very difficult one, perhaps we
> should write some software to aid us with that.
> Oh, I know one, it is called rand() ;-)
> --
> To UNSUBSCRIBE, email to debian-med-REQUEST at
> with a subject of "unsubscribe". Trouble? Contact listmaster at

"We can try to do it by breaking free of the mental prison of
separation and exclusion and see the world in its interconnectedness
and non-separability, allowing new alternatives to emerge."

More information about the Debtags-devel mailing list