[newmaint-site] Matching emails of same contributor ala carnivore - Was: Re: contributors.debian.org milestones

Enrico Zini enrico at enricozini.org
Sat Nov 30 10:09:45 UTC 2013


On Thu, Nov 21, 2013 at 03:53:16PM +0100, Olivier Berger wrote:

> (FYI, I had sent this only to Enrico while Alioth lists were down, but
> hopefully everyone gets it now.)

Ouch, I missed the repost when I replied to you privately. I'll re-reply
here.


> Here's a first attempt at storing in the DB the pubkey's different uids
> :
> https://gitorious.org/olberger/dc/commits/carnivore
> 
> I haven't yet tied it to the rest of the code. For the moment, it will
> just register 4000+ emails of the 1200+ keys in the keyrings, ready to
> be matched as multiple identifiers of single contributors.

Automatic email association based on DD and DM keyrings is something I
had in mind to do, too, so I'm very happy that you have worked on it.

A comment on the model:

  class GPGPubKey(models.Model):
      keyid = models.CharField(max_length=16)
      fpr = models.CharField(max_length=40)

      def __unicode__(self):
          return self.keyid + ' - ' + self.fpr

since a fingerprint is a valid identifier and a keyid is just a
substring of the fingerprint, this can just be:

  class GPGPubKey(models.Model):
      fpr = models.CharField(max_length=40, unique=True)

      def __unicode__(self):
          return self.fpr

In keyringanalyzer.py, I assume this docstring isn't what the code does,
since the code just builds a dict. And that is fine, since I would
prefer to use databases uniformly instead of having a mix of a RDBMS and
anydbm support files:

  def process_keyrings():
      """Process the keyrings and store the extracted data in an anydbm
      file."""

I'm happy to build on your code; when I'm back from lunch I plan to
merge it and tweak the maintenance procedure a bit. I wonder if the
model part is needed at all: it can just be a transient in-memory dict
built during maintenance and thrown away once new Identifier->User
associations have been saved in the database.

There was an extra twist in my plan of mining keyrings, and that was to
only consider UIDs that have been signed by at least one key in Debian
keyrings. In that way we can mine any key in any keyserver, and have a
level of protection against people adding a "leader at debian.org" UID to
their key for trolling points.

Still, we can consider DD and DM keyrings somewhat trusted, in the sense
that if someone uses them for trolling we can easily spot them and deal
with them appropriately.


Ciao,

Enrico

-- 
GPG key: 4096R/E7AD5568 2009-05-08 Enrico Zini <enrico at enricozini.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/newmaint-site/attachments/20131130/b8f70949/attachment.sig>


More information about the newmaint-site mailing list