[Daca-general] scan-build and metrics gsoc proposals and DACA
Raphael Geissert
geissert at debian.org
Sun Mar 17 17:46:38 UTC 2013
Hi Sylvestre, Zack, everyone,
While going through the list of gsoc proposals I found two that are
basically related to my idea of DACA.
Sylvestre's scan-build proposal faces basically the same problems DACA does
(job scheduling, and data reporting). However, if the tool evolves and is
left as a service then it could probably reach the point that DACA has
faced: problems with data storage, package and tool versions.
The current project proposal seems a bit simple to me; at least as far as
the description goes. Scalability in general is not even mentioned.
The results can obviously be put somewhere under qa.d.o/daca; but just so
that there is no confusion: there is no "magic" behind the html reports
other than some makefiles and php scripts whose output is stored as static
files.
Then there is Zack's proposal about metrics which is also another goal that
DACA aims to solve (obviously, there hasn't been much visible progress on
that side).
DACA is a potential source of plenty of data points: how many issues of X
type did a T tool report on a (pkg, pkgversion, tool, toolversion) tuple?
how does it compare to the same package and same version but with a newer or
older version of the tool? or with a different set of options (e.g.
experimental options)? How about two tools reporting an issue in the same
line of code? has the number of issues reported by a given tool decreased
over time? is there a bump on .0 versions? etc.
And that's just the tip of the iceberg, but the real problem is proper job
scheduling and data processing (to e.g. generate the "dumb html" reports or
combined views, etc.)
So here's where, I believe, all three projects meet: there is currently no
proper infrastructure for doing that kind of thing.
For DACA I've an initial implementation of such a system using gearman jobs
as a way to do everything from notifying of a new package version to
responding to that "job" and triggering other jobs ("get the list of tools",
"call every tool", "store result", "notify of new result", "get the list of
result analysers", "trigger new jobs for every single tool", etc).
This started well and the idea seemed good at first sight as you can connect
multiple job servers and workers and do all that stuff; but that's just one
part of it.
It seems like what is actually needed is something like hadoop & friends,
and that's the point where I'm currently stuck with DACA at. We don't even
have the proper stack in Debian.
What do you think about all this? Do you consider that it would be better to
re-think a little more the proposals and try to come up with something
bigger (but split so that more than one student can work on it)?
Cheers,
--
Raphael Geissert - Debian Developer
www.debian.org - get.debian.net
More information about the Daca-general
mailing list