[newmaint-site] contributors.d.o source for git.debian.org

Stefano Zacchiroli zack at debian.org
Sat Jan 10 15:38:01 UTC 2015


On Fri, Jan 09, 2015 at 12:44:33PM +0000, Martín Ferrari wrote:
> This is exactly the problem. I am sorry I forgot to mention this. Let me
> explain what's going on:

OK, got it, thanks.

> The alternative is to use git log (my preferred way). It is pretty slow
> (hence the many hours to process git.d.o), but honours whatever the
> committer put as their id.

Just a question about this. Your reasoning is based on the assumption
that the data gathering for contributors.d.o should be stateless (in the
sense that you restart from scratch every time), right? Because AFAICT
things might be much more sustainable, instead of rescanning "git log"
of any Git repo every time, one would just scan new commit objects
w.r.t. the last run.

I'm not telling you to implement it that way, of course, because:
a) if interested I should put my code where my mouth is :-), and b)
going stateful comes with its own drawbacks.

I'm just trying to understand if that option is something that you (or
others) have considered and ditched, or if that might (in the future,
when someone actually do that) be a viable approach to batchly collect
data about *all* Git repositories in a sustainable matter, and be done
with it once for all.

> The obvious drawback, is that non-packaging repos get ignored. I have
> just checked: no contributions are taken from the qa/debsources git repo.

Thanks for double-checking, I appreciate.

> I did not know about the content of the QA repos, but now that you
> pointed it out to me, I am sure that the way to go is to handle them
> separately. If you give me some guidelines, I can implement this easily
> as a new data source.

That would be awesome. So, I've looked around, and I think that the
following repos would be a good start to cover QA contributions:

  zack at moszumanska:~$ find /git/qa/ /git/collab-qa/ -type d -name '*.git'
  /git/qa/dose.git
  /git/qa/distro-tracker.git
  /git/qa/debmetrics.git
  /git/qa/qa.git
  /git/qa/debsources.git
  /git/qa/jenkins.debian.net.git
  /git/qa/bls.git
  /git/collab-qa/dhistory.git
  /git/collab-qa/udd.git
  /git/collab-qa/popcon-monitor.git
  /git/collab-qa/cloud-scripts.git
  /git/collab-qa/collab-qa.git
  /git/collab-qa/cloud-tools.git
  zack at moszumanska:~$

What I would like to avoid, though, is having 2 independent deployments
of your data source for contributors.d.o, that will have to be updated /
babysat independently at each change.

How hard would it be to have a single deployment, possibly with 2
different configuration files (with /usr/bin/find patterns or the like)
to describe which repos should be considered?

TIA,
Cheers.
-- 
Stefano Zacchiroli  . . . . . . .  zack at upsilon.cc . . . . o . . . o . o
Maître de conférences . . . . . http://upsilon.cc/zack . . . o . . . o o
Former Debian Project Leader  . . @zack on identi.ca . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 811 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/newmaint-site/attachments/20150110/09a9681b/attachment.sig>


More information about the newmaint-site mailing list