[Teammetrics-discuss] Please check VCS commits of Debian Med team

Andreas Tille andreas at an3as.eu
Tue Nov 11 08:47:22 UTC 2014


Hi Sukhbir,

since you answered via PM that you would like to work on this I'd like
to bring in some more ideas how we could probably drastically enhance
the performance of commitstat:  I commited scripts
get-{git,svn}-commit-logs to Git which could be run on alioth in a cron
job.  To fetch all relevant git and svn data for three example teams on
alioth it takes:

$ time ./get-git-commit-logs ; time ./get-svn-commit-logs 

real    0m13.969s
user    0m4.616s
sys     0m3.784s

real    0m3.682s
user    0m4.020s
sys     0m0.600s


and as a result it puts a bzipped yaml file here:

   http://teammetrics.alioth.debian.org/commitstats/

This does not require any ssh access, is reliable and two or three
orders of magnitude faster than the current approach.  Todo:

   1. Check whether the yaml syntax is correct (I simply wanted
      to demonstrate the principle without checking
   2. Instead of the former commitstat.py fetch the two compressed
      yaml files via http, clean up database and import the yaml
      data.  It seems the names of the svn log need to be resolved.
   3. Install the get-*-commit-logs as monthly (weekly?) cron jobs
      and make sure we fetch really all teams.  May be it makes
      send to have a clone of teammetrics.git on alioth and the
      shell scripts simply parse etc/teammetrics/commitinfo.conf.
      so a simple `git pull` can update the data fetch process.

Hope this was inspiring

     Andreas.

On Thu, Nov 06, 2014 at 07:37:31AM +0100, Andreas Tille wrote:
> Hi,
> 
> more news about this.  After I restarted the job I was watching
> 
>   SELECT count(*) from commitstat ; 
> 
> from time to time and learned that it did not changed for more than 20h.
> Now it started changing *very* slowly.  It seems commitstat.py is
> re-evaluating all the repositories from scratch rather than starting
> from the last commit that was inspected in the previous run.  It also
> seems to do this in a very unstructured and slow manner.  I think the
> main part of the job is to connect and disconnect to alioth for every
> single bit of work.  I think this is no acceptable approach to solve the
> problem of answering the question who commited when to what repository.
> 
> If I imagine a very simply approach to checkout every single repository
> and inspect the log this should take not more than say 10 hours.
> Considering that this is a quite brute force idea to solve the problem I
> wonder whether some more clever job running on alioth to prepare some
> kind of preprocessed data file and fetching only this file for
> postprocessing and feeding the database would be *way* more performant
> that we are currently doing.
> 
> Sukhbir, since you have not answered a single mail here, could you
> confirm that you
> 
>    a) are busy right now and come back later to this
>    b) are engaged in other fields and has droped the teammetrics topic
> 
> This would help me to decide how much time I need to spend into this
> in the next couple of monthes.
> 
> Kind regards
> 
>         Andreas.
> 
> On Tue, Nov 04, 2014 at 08:44:12PM +0100, Andreas Tille wrote:
> > Hi again,
> > 
> > after more than 30 hours processing it ended in:
> > 
> > $ commitstat.py -u tille
> > Traceback (most recent call last):
> >   File "./commitstat.py", line 256, in <module>
> >     get_stats()
> >   File "./commitstat.py", line 224, in get_stats
> >     svnstat.fetch_logs(ssh, conn, cur, svn)
> >   File "/home/tille/debian-maintain/alioth/teammetrics/repository/svnstat.py", line 34, in fetch_logs
> >     ftp.put(LOCAL_PATH, 'revisions.hash')                                                   
> >   File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 565, in put
> >     fr = self.file(remotepath, 'wb')
> >   File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 245, in open
> >     t, msg = self._request(CMD_OPEN, filename, imode, attrblock)
> >   File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 635, in _request    
> >     return self._read_response(num)
> >   File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 682, in _read_response
> >     self._convert_status(msg)
> >   File "/usr/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 710, in _convert_status
> >     raise IOError(errno.EACCES, text) 
> > IOError: [Errno 13] Permission denied
> > Exception in thread Thread-1 (most likely raised during interpreter shutdown):
> > Traceback (most recent call last):
> >   File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
> >   File "/usr/lib/python2.7/dist-packages/paramiko/transport.py", line 1574, in run
> > <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'error'
> > 
> > 
> > I restart the job (really hoping that some commit statement was injected
> > to not loose all those data ...)
> > 
> > Kind regards
> > 
> >        Andreas.
> > 
> > On Mon, Nov 03, 2014 at 02:58:24PM +0100, Andreas Tille wrote:
> > > Hi Sukhbir,
> > > 
> > > this problem was fixed in dd89476b6c332bf7d5c6c8ab97013fc1752686fc.
> > > 
> > > However, the sukhbir-guest account is not working on moszumanska.d.o.
> > > It would be great if you could fix this to enable easily running
> > > update-data.
> > > 
> > > Kind regards
> > > 
> > >            Andreas.
> > > 
> > > On Mon, Nov 03, 2014 at 08:43:12AM +0100, Andreas Tille wrote:
> > > > Hi Sukhbir,
> > > > 
> > > > according to the graph the Debian Med team has *way* less commits than
> > > > the last two years which is definitely wrong.  Moreover the stat did not
> > > > changed for November compared to the October stats.  This is definitely
> > > > not the case.
> > > > 
> > > > Could you please check this?
> > > > 
> > > > Thanks a lot
> > > > 
> > > >          Andreas.
> > > > 
> > > > -- 
> > > > http://fam-tille.de
> > > > 
> > > > _______________________________________________
> > > > Teammetrics-discuss mailing list
> > > > Teammetrics-discuss at lists.alioth.debian.org
> > > > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/teammetrics-discuss
> > > > 
> > > 
> > > -- 
> > > http://fam-tille.de
> > > 
> > > _______________________________________________
> > > Teammetrics-discuss mailing list
> > > Teammetrics-discuss at lists.alioth.debian.org
> > > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/teammetrics-discuss
> > > 
> > 
> > -- 
> > http://fam-tille.de
> > 
> > _______________________________________________
> > Teammetrics-discuss mailing list
> > Teammetrics-discuss at lists.alioth.debian.org
> > http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/teammetrics-discuss
> > 
> 
> -- 
> http://fam-tille.de
> 
> _______________________________________________
> Teammetrics-discuss mailing list
> Teammetrics-discuss at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/teammetrics-discuss
> 

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list