[Teammetrics-discuss] NNTPStat completed successfully.

Andreas Tille andreas at an3as.eu
Sat Nov 5 17:47:19 UTC 2011


On Sat, Nov 05, 2011 at 11:44:52AM +0530, Sukhbir Singh wrote:
> Hmm, ok so:
> 
> teammetrics=> SELECT COUNT(*) FROM listarchives WHERE archive_date
> <'1995-01-01';
> 
> count
> -------
>      9
> (1 row)
> 
> So 9 such messages. Out of which three are spam.

Well, these are just the messages which are *obviously* wrong.  There
are few chances to check whether a message that should be dated in say
year 2009 is rongly placed in for instance 2002.  Above we just have a
prove that wrong dates exist.  There is nearly no way to check how many
occurences of this problem we have.
 
> I understand your point about getting the message date from the
> archive date but don't you think nine special cases are not special
> enough from 1879288 messages for changing the standard way of getting
> the message date? :)

As I said: Your number estimation is most probably wrong and we do not
even have a reasonable way to estimate how wrong it is at all.
 
> If we want to resort to that approach, I can change it for the lists
> on Alioth but it won't be possible for Gmane archives, because getting
> the date from the header is the only way.

I agree that Gmane gives probably few chances to detect this.
 
> > I somehow have the impression that updatenames.py fails in some
> > circumstances.
> 
> Well, it does an exact match only and so it will only do what we tell
> it to! What kind of changes would you like to make in that?

Anything that makes all versions of "me" to "Andreas Tille".  Perhaps
we need some "author like '%...'" in addition.
 
> We are missing archives for some years and that is indeed a cause of
> worry, but then, it's your call! If you feel that the web archives
> method is better, we will go with that. Or wait for the mbox
> archives.

Could you estimate the time effort to work on this (to enable us
comparing what comes first - real mboxes or web archives?)

> (Though we did seem to be missing some authors from the web
> archives method IIRC)

Do you think so?  I do not remember.
 
> Er, sorry but you might have to start it again :'( . Last night, I
> needed to run liststat.py but I gave the command for commitstat.py by
> mistake and then I later cleared the files on vasks because I didn't
> know you were running it. My mistake, please run it again. Sorry!

No problem.  Wait a moment - I get a drink at next DebConf when we
meet again! :-)  That's the usual punishment for mistakes like this!

Kind regards

       Andreas.
 

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list