RDF / SPARQL "API" for debbugs - Was: Re: Getting rid of bts.d.o files parsing: alternatives

Olivier Berger olivier.berger at it-sudparis.eu
Tue Jan 27 08:35:42 UTC 2009


Le lundi 26 janvier 2009 à 22:53 +0100, Sandro Tosi a écrit :
> Hi all,
> Today I had an interesting discussion on #debian-devel about ways to
> avoid to elaborate bugs.d.o files, here is a recap.
> 
> * Current file parsing - let's remove it
> it works but replicate GBs (t0o work locally) or grepping over many
> files is not a way I would stay with forever ;) In particular because
> then it uses those files again to retrieve other bugs information, and
> we should query the SOAP interface once we have the bug #.
> 

Well, on the other side, it helps diminish the network load if all is
needed is a massive batch and every needed information is in the DB...

Maybe a mixed mode would be the best option, still.

I'd suggest to not get rid of it at all, but instead implement some
parallel set of "bugs attributes fetching" : one working offline on the
copy of the DB, and the other one using SOAP.

In the latter case, you probably now already there are already a couple
of python libs already using SOAP (bts-utils, etc.) that may be reused
instead of reinventing the wheel ;)
I tried and list some bits here
https://picoforge.int-evry.fr/cgi-bin/twiki/view/Helios_wp3/Web/BugHarvestingInPython btw.
Actually, I've done things in the reverse way for some POC for project
HELIOS recently : I extended python-btsutils so that it can parse
contents of the debbugs database .summary files by reusing bits of
bts-link for that... it's in
http://picoforge.int-evry.fr/websvn/listing.php?repname=helios_wp3&path=/trunk/python/ more or less... but I still need to clean my git setup so that my branches are cleaner, and I can separate much better my new bits from the standard code.

> * UDD - a master database to collect all Debian project information,
> hence bugs too:
> 
> udd=> select count(*) from bugs where forwarded like 'http%' AND
> status not in ('done','fixed');
>  count
> -------
>   4392
> (1 row)
> 
> this is a sample query to retrieve all opened bugs in forwarded state.
> I don't think udd psql db will ever be opened to everyone to connect
> to, so this way will only be accessible from debian hosts.
> dondelelcaro (BTS maintainer) noted that adding another layer over BTS
> might not be that smart. And UDD is some hours back (since bts info
> are update only every 4 hrs)
> 

IMHO, having bts-link work anywhere from the Net would be a plus, for
instance to allow me to test improvements, even if not having access to
a Debian server account.

Same would stand for any similar tool like bts-link which needs to query
the bugtracker.

> * Current API
> 
> $ bts select status:forwarded | wc -l
> 6915
> 
> So there already exists a way to query forwarded bug, but dondelelcaro
> discourage to use it since sloooooow
> 

Sure, SOAP on a mass of bugs is probably not fast ;)

> * New API - to be requested
> 
> dondelelcaro suggest me to request a new API, to retrieve in an
> efficient matter all the forwarded bugs, so we can parse it and do
> what we have to.
> 

I think there are probably things to have a look at elsewhere
(laucnhpad ?).

For instance, there is the possibility to cache some metadata into a RDF
store on the bugtracker sude, and allow the delivery of SPARQL queries
for the outside. 
That'd allow to query the debbugs server with *standard* ontologies,
hence opening it to interoperability.
That's the way we're currently trying to test in Helios (reusing what
was done in Nepomuk : see SWIM bellow), for instance, to allow more
interoperability between bugtrackers, and maybe make bts-link a generic
tool, non Debian specific.

I miss time to document such an approach here ATM, but I'd be glad to
discuss that more (maybe at the FOSDEM ?), and provide an example of
such a RDF store online for people to grasp the idea.

In the meantime, you may get an idea of such a system on Mandriva's side
with SWIM : http://club.mandriva.com/xwiki/bin/view/swim/ (currently
down), which is an outcome of the aforementioned nepomuk project's
developments.

To give an idea of such SPARQL queries that may be done on the
bugtracker's RDF API, here's an example (real data on a substential copy
of debbugs data, circa 250000 bugs parsed and stored as RDF in
openvirtuoso on my laptop, currently) :

        prefix lino:
        <http://nepomuk.kde.org/ontologies/2008/11/25/lino#>
        prefix bom: <http://www.ifi.unizh.ch/ddis/evoont/2008/11/bom#>
        
        select * where { 
        ?a lino:hasReporter <mailto:/olivier.berger at it-sudparis.eu> .
        ?a bom:isIssueOf ?c
        } order by ?c
        
gives :
a  	c
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=496918 software:/apache2-mpm-prefork
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=497534 software:/apache2.2-common
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=417898 software:/beagle
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=418540 software:/bluez-utils
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=420571 software:/brasero
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=487930 software:/bzr
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=471506 software:/cron
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=272674 software:/db4.2-util
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=494300 software:/debhelper
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=483418 software:/debian-policy
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=485776 software:/debian-policy
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=464834 software:/dhcdbd
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=483119 software:/dpkg-dev
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=499803 software:/egroupware-core
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=499804 software:/egroupware-core
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=322554 software:/evolution
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=481486 software:/evolution
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=422719 software:/exim4-config
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=455754 software:/firehol
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=496071 software:/glpi
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=382074 software:/gnome-applets
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=328830 software:/gnome-system-tools
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=418543 software:/gnome-vfs-obexftp
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=359920 software:/gnome-volume-manager
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=320790 software:/hal
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=451350 software:/icedove-gnome-support
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=485608 software:/libc6
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=310380 software:/libldap2
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=451282 software:/linux-2.6
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=496069 software:/moodle
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=492216 software:/nagios3-common
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=469267 software:/nautilus
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=482107 software:/network-manager-gnome
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=407471 software:/phpgroupware
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=487022 software:/phpgroupware-0.9.16-core-base
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=487354 software:/phpgroupware-0.9.16-core-base
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=387455 software:/phpgroupware-eldaptir
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=387462 software:/phpgroupware-eldaptir
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=472685 software:/phpgroupware-phpsysinfo
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=339317 software:/pidentd
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=379799 software:/pmount
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=326060 software:/poxml
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=455770 software:/python-setuptools
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=474113 software:/revelation
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=475158 software:/snmpd
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=475366 software:/snmpd
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=469684 software:/svn-buildpackage
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=408097 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=408211 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=410080 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=411987 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=415884 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=420788 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=420789 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=444975 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=466530 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=494969 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=496514 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=496515 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=496518 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=496520 software:/sympa
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=408397 software:/tkcvs
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=454347 software:/twiki
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=464174 software:/twiki
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=464671 software:/twiki
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=482306 software:/twiki
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=482321 software:/twiki
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=483127 software:/twiki
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=494993 software:/twiki
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=432742 software:/uw-imapd
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=454253 software:/vserver-debiantools
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=482283 software:/vzctl
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=481470 software:/wnpp
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=493969 software:/wnpp
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=494849 software:/wnpp
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=495428 software:/wnpp
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=495429 software:/wnpp
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=495542 software:/wnpp
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=497029 software:/wnpp
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=430676 software:/xen-utils-common
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=430778 software:/xen-utils-common
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=461061 software:/xinetd
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=481027 software:/zim

So this is not exactly what could help bts-link who doesn't care for
which packages I filed bugs... but I hope you get the idea.

All is left to do is model the relationships like forwarded-upstream and
usertags in an ontology, and integrate that in the RDF store (next steps
as soon as i can code ;)

Still such an RDF/SPARQL API wouldn't help so much with speed of
processing, if that's an issue. For volume of data, well, I suppose it
wouldn't be more efficient than .summary files eather ;)

> 
> Both the API approach can work locally and on debian hosts, and given
> that UDD is still a work in progress, I'd like to "sponsor" the "new
> API".
> 
> Thought? Ideas? Others? :)
> 

One more idea : the debbugs is obviously not the single bugtracker on
which queries are made, so I hope microformats of SPARQL APIs could
generalize on bugtrackers side to allow us to get rid from all the
custom regex and stuff for all different bugtrackers ;)

I hope this gives useful cents to the discussion.

Lookin' forward to helping... and introducing standard data
representation formats to make bts-link even more generic.

> Cheers,

Regards,
-- 
Olivier BERGER <olivier.berger at it-sudparis.eu>
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)




More information about the Bts-link-devel mailing list