[newmaint-site] contributors.d.o source for git.debian.org

Martín Ferrari tincho at tincho.org
Mon Jan 19 10:31:49 UTC 2015


Hi, sorry for the late answer...


On 10/01/15 15:38, Stefano Zacchiroli wrote:

>> The alternative is to use git log (my preferred way). It is pretty slow
>> (hence the many hours to process git.d.o), but honours whatever the
>> committer put as their id.

> Just a question about this. Your reasoning is based on the assumption
> that the data gathering for contributors.d.o should be stateless (in the
> sense that you restart from scratch every time), right? Because AFAICT
> things might be much more sustainable, instead of rescanning "git log"
> of any Git repo every time, one would just scan new commit objects
> w.r.t. the last run.

While I like the fact that it is stateless, it is not because of that. I
am not sure if it is possible to send incremental data for a person to
contributors.d.o. At the same time, I don't know how to perform a
partial git scan, knowing that somebody could push commits that were
done in the past, before the last scan.

> I'm just trying to understand if that option is something that you (or
> others) have considered and ditched, or if that might (in the future,
> when someone actually do that) be a viable approach to batchly collect
> data about *all* Git repositories in a sustainable matter, and be done
> with it once for all.

It would be good to find ways to optimise this. In any case, I am
currently gathering data for all or almost all git repositories in
alioth. I am only excluding from the main collection repos that are
already scanned in other sources (pkg-perl, collab-maint, etc).

> That would be awesome. So, I've looked around, and I think that the
> following repos would be a good start to cover QA contributions:

So I created the qa.debian.org data source. I put you as admin along
with me, there is a daily script gathering commits from these repos, and
they are also excluded from git.d.o now. Feel free to edit descriptions
and the such. If you want, I can also give you the script if you prefer
to run the cronjob yourself.

The data is already uploaded at:
https://contributors.debian.org/source/qa.debian.org

Is this good for you?

> What I would like to avoid, though, is having 2 independent deployments
> of your data source for contributors.d.o, that will have to be updated /
> babysat independently at each change.

> How hard would it be to have a single deployment, possibly with 2
> different configuration files (with /usr/bin/find patterns or the like)
> to describe which repos should be considered?

I don't really understand what are you referring to here.. What I did
was to create a qa.debian.org.conf file in my $HOME, that says this:

contribution: commit
method: gitlogs
dirs: /srv/git.debian.org/git/qa/*.git
/srv/git.debian.org/git/collab-qa/*.git
url: https://qa.debian.org/developer.php?login={email}&comaint=yes


How are you thinking of implementing it?


Tincho.

-- 
Martín Ferrari (Tincho)



More information about the newmaint-site mailing list