[Secure-testing-team] Mini-meeting at DebConf - minutes

Wed Jun 27 12:27:09 UTC 2007

Well, I couldn't make it to debconf, but I probably should contribute
a few notes anyway.

Status of the tracker software
------------------------------

As most of you probably know, the web service
(<http://security-tracker.debian.net/>, <http://idssi.enyo.de/tracker/>)
works by watching for Subversion commits and updates its internal
database accordingly.  The data used by debsecan to display
vulnerability information is computed as well.

Processing a single commit currently takes more than ten minutes.
Incorporating Package and Source file updates takes even longer.  It's
not necessary to process every commit individually, that's why the
system can keep up with the commits.  I suppose that switching to
faster hardware could bring it down to 40 seconds or something like
that, which still isn't acceptable IMO.  (This number is an estimate
for top-of-the-line amd64 hardware with oodles of RAM, just to put
that number into perspective.) It makes editing the data files
unnecessarily hard if you can't run this tool locally--even though the
plain syntax check performed by "make check" is quick.

Part of the problem is that all data is pre-computed.  One obvious
change would be to pre-compute the data for the overview pages (for
unstable/testing/stable) only, and recompute the per-package
vulnerability status as needed for the per-package/per-bug pages.
However, the logic which decides which package is vulnerable and which
is not is a series of SQL statements, not Python code[*], which makes
this somewhat difficult.  It would also help a lot if the code used
hashes to detect which data/CVE/list entries have actually changed,
and re-compute only the affected package and bug data.  Again, this
appears to be difficult to implement in the current framework.

[*] Actually, there are two different SQL implementations and two
    Python implementations (one of them in debsecan), if I recall
    correctly.  It seems that they are mostly in sync, but it's not a
    nice design.

I've experimenting with a GUI editor for data/CVE/list recently which
simplifies creating NOT-FOR-US entries.  The Bayesian learning module
didn't work as well as I had hoped, but I still think that such a tool
can be helpful.  If actually used, it would streamline the process of
rating CVEs and avoid stupid mistakes (like missing JFFNMS when
there's actually a package called "jffnms").  This still needs
signifcant amounts of work, though.  The current prototype depends on
non-free software; I also used it as an exercise to become familiar
with this particular way of GUI development.  It might also make sense
to re-cast this as a web application.  Whatever we do, we should share
the infrastructure code (file parsing etc.) with the code that feeds
the database.

Right now, I fear that it's not feasible for anyone except me to work
on the software.  But my next step will be to document the actual
algorithm used to derive exact per-bug, per-package vulnerability data
from the data/CVE/list.  Unfortunately, it's a bit of black art in a
some corners.  This document will also make clear why addressing bug
#357942 is so hard, and perhaps even show what changes are needed for
fix.  Based on that documentation, it should be possible to change the
Python code, or rewrite it from scratch, if anyone is actually
interested.  I also plan to improve matters by writing actual code,
but the amount of time I can put into this depends on a factor I can
only partially contorl (there's one fairly important hobby project of
mine which I really need to bring forward, but my employer might
actually developer resources at that one, so I wouldn't have to do it
in my spare time, and all by myself).

No progress whatsoever has been made on the "document the likely level
of security support packages will receive" front, I'm afraid.