[Debtags-devel] where to end debram development

Thaddeus H. Black t@b-tk.org
Sun, 17 Oct 2004 23:35:08 +0000


--zhXaljGHf11kAtnf
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Enrico asked,

> BTW, how do you define accuracy?

An excellent question.  (I had hoped that you would not ask it.)  Here
is a thorough answer.

I do not really define accuracy, do I?  The 97 to 98 percent figure
represents my own subjective sense of how good the ramification
is---based on living with it for the past two years, and on
misramification reports from you, Erich and others.  There are not two
or three typos per hundred---where I typed 1826 when I meant 1862, for
example.  Some typos remain but nearly not so many.  However---

  * Sometimes I misunderstand a package Description.  This may be
    because I misread it, because it is written badly, or because the
    package is highly esoteric and I lack the background to understand
    its purpose.  Occasionally I do unpack a doubtful package to learn
    more about it, but I lack time to do this for every doubtful
    package.  Some errors arise in this way.

  * Sometimes I understand the package Description but still misjudge
    where to ramify the package.

  * Sometimes the ramification plan itself is suboptimal, so that I
    can find no wholly correct place to put the package.  In such
    cases, the ram number I assign is sort of a half-error, no matter
    which number I choose.  ([1752 MySQL] contains several examples of
    this kind.)

  * Sometimes a user might simply disagree with my ramification of a
    particular package, even though I had a clear reason for ramifying
    it there.  Ramification is somewhat subjective.  If one user
    disagrees, then probably some other users also disagree, which
    means in some sense that the ramification is not wholly correct.

At any given moment, I think the ramification 100 percent correct.
Known errors are immediately fixed.  Thus there are no known errors.

> One of the problems that I have found is applying a list of tags to a
> ramification so that all those tags can be applied to all packages
> in that ramification.  This is not always easy to do: if I take
> 1823 X FONTS, for example, I would like to tag them with 'role::font' or
> 'x11::font', but I can't because there are font programs and not only
> fonts.  Or I would like to tag them with 'interface::x11', but I can't
> because there are fonts packages inside.  So, I can only map that to
> 'media::font', and with such a coarse mapping the accuracy you mention
> is really not needed, and not worth the effort.

I would offer some specific remarks below on how to improve the mapping,
but here I would say that the whole debram concept is simply not as good
as is the debtags concept.  Your words identify the fundamental debram
weakness.  If we counted this factor against debram, then I would say
that debram were only 60 to 70 percent good.  (I say "good" not
"accurate" because in this light the concept of accuracy fails: one
cannot rationally judge debram's "accuracy" with regard to how well
debram fits into the debtags scheme.)

However, even regarding debram by its own lights, consideration of the
factors above and experience with the rate at which new errors are found
make me reluctant to claim more than 97 or 98 percent accuracy overall,
despite all the care which has gone into the ramification.  For better
accuracy, more human editors are needed, different human editors who can
cross-check one another.  With debtags, we will have this advantage;
and---go bayesian go---also hopefully the advantage of some good
machine-checking, too.

> Some time ago I added debram support to autodebtag and I had my first
> experience with mapping debram into debtag.
>=20
> One possible thing you could do, however, if you're going through the
> ramification file, is trying to associate some tags to ramifications, so
> that they can be picked up by autodebtag.  It would be interesting to
> have them directly in the debram file, but if that can be a problem
> you can put them in the autodebtag script: there's a (big) table for
> that at the beginning of the script.

All right.  Ben agrees with you and no one seems to disagree.  I should
complete (a) and (b) as mentioned in the previous mail, then retire from
debram development as such.  About the practical problem of mapping
debram to debtags, you know much more than do I.  Unlike Ben, I do not
enjoy writing AI, and I am not very good at it; but I am beginning to
sense that some good AI might really help in completing the
debram -> debtags mapping.

However, there is this.  With the exception of a few branches
like [1362 Perl Modules], [1580 CJK], [1752 MySQL], [1864 KDE]
and [8671 Apache]---branches which I personally have lacked the time,
interest and/or knowledge to subramify---the ramification is rather
detailed.  It has some three hundred branches already; the sarge
reform will add more.  The detail is fine enough that we can
probably choose a few specific tags for each branch of the ramification.
Your [1823 X Fonts] example illustrates the point perfectly.  With the
specific tags chosen, a human editor could presumably tag the packages
in a given branch fairly quickly and quite accurately.  This would be
much, much faster, and much more accurate, than just starting with the
BIG HEAP of fourteen thousand packages would have been.

When it comes to this, I will appreciate working alongside Gustavo and
whoever else chooses to join in the manual tag-editing effort.  In fact
we will want more than just me and Gustavo.  Debram has taught me that
there are just too many packages for one fellow to handle alone.

--=20
Thad

--zhXaljGHf11kAtnf
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAkFzAawACgkQh3E0gzgBXn6cbQCfYKk2z20VngCuQcdrShavJMUe
QD0AoLnUU6wwRoFXZIl3K74hHKQQw6xI
=8e82
-----END PGP SIGNATURE-----

--zhXaljGHf11kAtnf--