[Popcon-developers] Raw popcon data repo schema and access questions
Pavan Gupta
pg8p at virginia.edu
Thu Sep 18 15:21:24 UTC 2014
On Thu, Sep 18, 2014 at 11:01 AM, Bill Allombert <ballombe at debian.org>
wrote:
> On Thu, Sep 18, 2014 at 10:06:07AM -0400, Pavan Gupta wrote:
> > On my first question, I was essentially looking for a database schema.
> My
> > goal was to discern whether you were collecting any kind of metadata
> about
> > the final data you are aggregating and presenting. Generally, answering
> > that question openly may still be of value for people interested in
> > understanding how the data they deliver to popcon is being collected and
> > processed.
>
> We do not use a database. Instead we store reports (meaning the output of
> /usr/sbin/popularity-contest) as they are received, keeping exactly one
> by HOSTID (see /etc/popularity-contest.conf).
> The amount of data stored is about 10GB. We update 1/7th of that everyday.
>
>
The HOSTID being effectively tagged to the data is fantastic. You can
certainly do some basic clustering with that, but with the data rolling
over every week, you do lose a little bit of the power to see how packages
might forecast other packages especially if that historical raw data is
lost forever.
> > And no problem, I shall look to solve this problem elsewhere. Keep up
> the
> > great work!
>
> Well, there might be people on this list with more experience than me in
> datamining popcon.
>
And it looks like this may already have found an implementation through the
Ubuntu's software-center recommendation service. They call out in some of
their arch documents an overlap with popcon - - why they would reinvent the
wheel is weird to me -- but it looks like they might be collecting data
differently and more deeply to achieve a similar popcon objective with
recommendations on top. Reading their stuff, I couldn't help but think
writing up a fresh strategy for recommendations is wasteful. Has this team
coordinated at all with them?
> Cheers,
> --
> Bill. <ballombe at debian.org>
>
> Imagine a large red swirl here.
>
-P
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/popcon-developers/attachments/20140918/4b6baa7d/attachment-0001.html>
More information about the Popcon-developers
mailing list