[Pkg-bazaar-commits] ./bzr-stats/unstable r19: Merge trunk.

Wed Jul 16 17:31:20 UTC 2008

------------------------------------------------------------
revno: 19
committer: Jelmer Vernooij <jelmer at samba.org>
branch nick: debian
timestamp: Wed 2008-07-16 19:31:20 +0200
message:
  Merge trunk.
removed:
  test_stats.py
added:
  classify.py
  test_classify.py
modified:
  __init__.py
  debian/changelog
  setup.py
    ------------------------------------------------------------
    revno: 10.1.2
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Mon 2007-11-05 05:26:47 +0100
    message:
      Split out functionality that sorts revids by commmitter.
    modified:
      __init__.py
    ------------------------------------------------------------
    revno: 10.2.1
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: stats
    timestamp: Mon 2007-11-05 20:31:49 -0600
    message:
      merge in Jelmer's setup.py and split out sorting functionality.
    added:
      setup.py
    modified:
      __init__.py
    ------------------------------------------------------------
    revno: 10.2.2
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Tue 2007-11-20 19:18:21 +0100
    message:
      Change name to committer-stats, to allow for other sorts of stats too.
    modified:
      __init__.py
    ------------------------------------------------------------
    revno: 10.2.3
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Sun 2007-12-09 05:21:27 +0100
    message:
      Provide enough information for setup.py register to work.
    modified:
      setup.py
    ------------------------------------------------------------
    revno: 10.2.4
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Sun 2008-03-09 16:51:12 +0100
    message:
      Merge upstream.
    modified:
      __init__.py
        ------------------------------------------------------------
        revno: 10.3.1
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: stats
        timestamp: Thu 2008-03-06 10:21:59 +0000
        message:
          Make a lot of imports lazy since they may not actually be used.
        modified:
          __init__.py
        ------------------------------------------------------------
        revno: 10.3.2
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: stats
        timestamp: Fri 2008-03-07 16:55:43 +0000
        message:
          (Wesley J. Landaker) properly import ui before using it.
        modified:
          __init__.py
            ------------------------------------------------------------
            revno: 10.4.1
            committer: Wesley J. Landaker <wjlanda at sandia.gov>
            branch nick: stats
            timestamp: Fri 2008-03-07 09:16:58 -0700
            message:
              Added ui to bzrlib lazy imports.
            modified:
              __init__.py
    ------------------------------------------------------------
    revno: 10.2.5
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Sat 2008-06-28 20:54:37 +0200
    message:
      Add code for classifying commits.
    added:
      classify.py
      test_classify.py
    ------------------------------------------------------------
    revno: 10.2.6
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Sat 2008-06-28 21:06:38 +0200
    message:
      Rename collapse_by_author -> collapse_by_person since author has an unambigous meaning
    modified:
      __init__.py
    ------------------------------------------------------------
    revno: 10.2.7
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Sat 2008-06-28 21:08:44 +0200
    message:
      Use get_apparent_author rather than committer.
    modified:
      __init__.py
    ------------------------------------------------------------
    revno: 10.2.8
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Sat 2008-06-28 22:07:45 +0200
    message:
      Add credits command, test classify code by default, add comments to classify code.
    modified:
      __init__.py
      classify.py
    ------------------------------------------------------------
    revno: 10.2.9
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Sat 2008-06-28 22:31:01 +0200
    message:
      List contributors with more contributions first.
    modified:
      __init__.py
    ------------------------------------------------------------
    revno: 10.2.10
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Sat 2008-06-28 22:34:11 +0200
    message:
      Add --show-class argument to stats command.
    modified:
      __init__.py
    ------------------------------------------------------------
    revno: 10.2.11
    committer: Lukáš Lalinský <lalinsky at gmail.com>
    branch nick: stats
    timestamp: Fri 2008-07-04 14:10:24 +0200
    message:
      Some stats fixes:
      
       - Don't use full name as email when there is no email
       - Use name/email parsing function from bzrlib.config
       - Always use rev.get_apparent_author()
    modified:
      __init__.py
    ------------------------------------------------------------
    revno: 10.2.12
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Fri 2008-07-04 14:24:39 +0200
    message:
      Remove now-obsolete tests.
    removed:
      test_stats.py
    modified:
      __init__.py
    ------------------------------------------------------------
    revno: 10.2.13
    committer: Jelmer Vernooij <jelmer at samba.org>
    branch nick: trunk
    timestamp: Fri 2008-07-04 14:43:31 +0200
    message:
      Add another progress bar.
    modified:
      __init__.py
-------------- next part --------------
=== modified file '__init__.py'

--- a/__init__.py	2007-07-17 16:17:40 +0000
+++ b/__init__.py	2008-07-04 12:43:31 +0000
@@ -2,28 +2,21 @@
 
 import re
 
-from bzrlib import errors, tsort
-from bzrlib.branch import Branch
-import bzrlib.commands
-from bzrlib.config import extract_email_address
-from bzrlib.workingtree import WorkingTree
-
-
-_fullname_re = re.compile(r'(?P<fullname>.*?)\s*<')
-
-def extract_fullname(committer):
-    """Try to get the user's name from their committer info."""
-    m = _fullname_re.match(committer)
-    if m:
-        return m.group('fullname')
-    try:
-        email = extract_email_address(committer)
-    except errors.BzrError:
-        return committer
-    else:
-        # We found an email address, but not a fullname
-        # so there is no fullname
-        return ''
+from bzrlib.lazy_import import lazy_import
+lazy_import(globals(), """
+from bzrlib import (
+    branch,
+    commands,
+    config,
+    errors,
+    option,
+    tsort,
+    ui,
+    workingtree,
+    )
+from bzrlib.plugins.stats.classify import classify_delta
+from itertools import izip
+""")
 
 
 def find_fullnames(lst):
@@ -31,14 +24,14 @@
 
     counts = {}
     for committer in lst:
-        fullname = extract_fullname(committer)
+        fullname = config.parse_username(committer)[0]
         counts.setdefault(fullname, 0)
         counts[fullname] += 1
     return sorted(((count, name) for name,count in counts.iteritems()), reverse=True)
 
 
-def collapse_by_author(committers):
-    """The committers list is sorted by email, fix it up by author.
+def collapse_by_person(committers):
+    """The committers list is sorted by email, fix it up by person.
 
     Some people commit with a similar username, but different email
     address. Which makes it hard to sort out when they have multiple
@@ -56,7 +49,7 @@
     counter_to_info = {}
     counter = 0
     for email, revs in committers.iteritems():
-        fullnames = find_fullnames(rev.committer for rev in revs)
+        fullnames = find_fullnames(rev.get_apparent_author() for rev in revs)
         match = None
         for count, fullname in fullnames:
             if fullname and fullname in name_to_counter:
@@ -85,30 +78,36 @@
             for revs, email, fname in counter_to_info.values()), reverse=True)
 
 
+def sort_by_committer(a_repo, revids):
+    committers = {}
+    pb = ui.ui_factory.nested_progress_bar()
+    try:
+        pb.note('getting revisions')
+        revisions = a_repo.get_revisions(revids)
+        for count, rev in enumerate(revisions):
+            pb.update('checking', count, len(revids))
+            email = config.parse_username(rev.get_apparent_author())[1]
+            committers.setdefault(email, []).append(rev)
+    finally:
+        pb.finished()
+    
+    return committers
+
+
 def get_info(a_repo, revision):
     """Get all of the information for a particular revision"""
-    pb = bzrlib.ui.ui_factory.nested_progress_bar()
-    committers = {}
+    pb = ui.ui_factory.nested_progress_bar()
     a_repo.lock_read()
     try:
         pb.note('getting ancestry')
         ancestry = a_repo.get_ancestry(revision)[1:]
-        pb.note('getting revisions')
-        revisions = a_repo.get_revisions(ancestry)
 
-        for count, rev in enumerate(revisions):
-            pb.update('checking', count, len(ancestry))
-            try:
-                email = extract_email_address(rev.committer)
-            except errors.BzrError:
-                email = rev.committer
-            committers.setdefault(email, []).append(rev)
+        committers = sort_by_committer(a_repo, ancestry)
     finally:
         a_repo.unlock()
         pb.finished()
 
-    info = collapse_by_author(committers)
-    return info
+    return collapse_by_person(committers)
 
 
 def get_diff_info(a_repo, start_rev, end_rev):
@@ -116,7 +115,7 @@
     
     This lets us figure out what has actually changed between 2 revisions.
     """
-    pb = bzrlib.ui.ui_factory.nested_progress_bar()
+    pb = ui.ui_factory.nested_progress_bar()
     committers = {}
     a_repo.lock_read()
     try:
@@ -131,18 +130,19 @@
         for count, rev in enumerate(revisions):
             pb.update('checking', count, len(ancestry))
             try:
-                email = extract_email_address(rev.committer)
+                email = config.extract_email_address(rev.get_apparent_author())
             except errors.BzrError:
-                email = rev.committer
+                email = rev.get_apparent_author()
             committers.setdefault(email, []).append(rev)
     finally:
         a_repo.unlock()
         pb.finished()
 
-    info = collapse_by_author(committers)
+    info = collapse_by_person(committers)
     return info
 
-def display_info(info, to_file):
+
+def display_info(info, to_file, gather_class_stats=None):
     """Write out the information"""
 
     for count, revs, emails, fullnames in info:
@@ -172,23 +172,29 @@
                     to_file.write("''\n")
                 else:
                     to_file.write("%s\n" % (email,))
-
-
-class cmd_statistics(bzrlib.commands.Command):
+        if gather_class_stats is not None:
+            print '     Contributions:'
+            classes, total = gather_class_stats(revs)
+            for name,count in sorted(classes.items(), lambda x,y: cmp((x[1], x[0]), (y[1], y[0]))):
+                to_file.write("     %4.0f%% %s\n" % ((float(count) / total) * 100.0, "Unknown" if name is None else name))
+
+
+class cmd_committer_statistics(commands.Command):
     """Generate statistics for LOCATION."""
 
-    aliases = ['stats']
+    aliases = ['stats', 'committer-stats']
     takes_args = ['location?']
-    takes_options = ['revision']
+    takes_options = ['revision', 
+            option.Option('show-class', help="Show the class of contributions")]
 
     encoding_type = 'replace'
 
-    def run(self, location='.', revision=None):
+    def run(self, location='.', revision=None, show_class=False):
         alternate_rev = None
         try:
-            wt = WorkingTree.open_containing(location)[0]
+            wt = workingtree.WorkingTree.open_containing(location)[0]
         except errors.NoWorkingTree:
-            a_branch = Branch.open(location)
+            a_branch = branch.Branch.open(location)
             last_rev = a_branch.last_revision()
         else:
             a_branch = wt.branch
@@ -208,13 +214,15 @@
                 info = get_info(a_branch.repository, last_rev)
         finally:
             a_branch.unlock()
-        display_info(info, self.outf)
-
-
-bzrlib.commands.register_command(cmd_statistics)
-
-
-class cmd_ancestor_growth(bzrlib.commands.Command):
+        def fetch_class_stats(revs):
+            return gather_class_stats(a_branch.repository, revs)
+        display_info(info, self.outf, fetch_class_stats if show_class else None)
+
+
+commands.register_command(cmd_committer_statistics)
+
+
+class cmd_ancestor_growth(commands.Command):
     """Figure out the ancestor graph for LOCATION"""
 
     takes_args = ['location?']
@@ -223,9 +231,9 @@
 
     def run(self, location='.'):
         try:
-            wt = WorkingTree.open_containing(location)[0]
+            wt = workingtree.WorkingTree.open_containing(location)[0]
         except errors.NoWorkingTree:
-            a_branch = Branch.open(location)
+            a_branch = branch.Branch.open(location)
             last_rev = a_branch.last_revision()
         else:
             a_branch = wt.branch
@@ -247,16 +255,122 @@
                 self.outf.write('%4d, %4d\n' % (revno, cur_parents))
 
 
-bzrlib.commands.register_command(cmd_ancestor_growth)
+commands.register_command(cmd_ancestor_growth)
+
+
+def gather_class_stats(repository, revs):
+    ret = {}
+    total = 0
+    pb = ui.ui_factory.nested_progress_bar()
+    try:
+        repository.lock_read()
+        try:
+            i = 0
+            for delta in repository.get_deltas_for_revisions(revs):
+                pb.update("classifying commits", i, len(revs))
+                for c in classify_delta(delta):
+                    if not c in ret:
+                        ret[c] = 0
+                    ret[c] += 1
+                    total += 1
+                i += 1
+        finally:
+            repository.unlock()
+    finally:
+        pb.finished()
+    return ret, total
+
+
+def display_credits(credits):
+    (coders, documenters, artists, translators) = credits
+    def print_section(name, lst):
+        if len(lst) == 0:
+            return
+        print "%s:" % name
+        for name in lst:
+            print "%s" % name
+        print ""
+    print_section("Code", coders)
+    print_section("Documentation", documenters)
+    print_section("Art", artists)
+    print_section("Translations", translators)
+
+
+def find_credits(repository, revid):
+    """Find the credits of the contributors to a revision.
+
+    :return: tuple with (authors, documenters, artists, translators)
+    """
+    ret = {"documentation": {},
+           "code": {},
+           "art": {},
+           "translation": {},
+           None: {}
+           }
+    repository.lock_read()
+    try:
+        ancestry = filter(lambda x: x is not None, repository.get_ancestry(revid))
+        revs = repository.get_revisions(ancestry)
+        pb = ui.ui_factory.nested_progress_bar()
+        try:
+            for i, (rev,delta) in enumerate(izip(revs, repository.get_deltas_for_revisions(revs))):
+                pb.update("analysing revisions", i, len(revs))
+                # Don't count merges
+                if len(rev.parent_ids) > 1:
+                    continue
+                for c in set(classify_delta(delta)):
+                    author = rev.get_apparent_author()
+                    if not author in ret[c]:
+                        ret[c][author] = 0
+                    ret[c][author] += 1
+        finally:
+            pb.finished()
+    finally:
+        repository.unlock()
+    def sort_class(name):
+        return map(lambda (x,y): x, 
+               sorted(ret[name].items(), lambda x,y: cmp((x[1], x[0]), (y[1], y[0])), reverse=True))
+    return (sort_class("code"), sort_class("documentation"), sort_class("art"), sort_class("translation"))
+
+
+class cmd_credits(commands.Command):
+    """Determine credits for LOCATION."""
+
+    takes_args = ['location?']
+    takes_options = ['revision']
+
+    encoding_type = 'replace'
+
+    def run(self, location='.', revision=None):
+        try:
+            wt = workingtree.WorkingTree.open_containing(location)[0]
+        except errors.NoWorkingTree:
+            a_branch = branch.Branch.open(location)
+            last_rev = a_branch.last_revision()
+        else:
+            a_branch = wt.branch
+            last_rev = wt.last_revision()
+
+        if revision is not None:
+            last_rev = revision[0].in_history(a_branch).rev_id
+
+        a_branch.lock_read()
+        try:
+            credits = find_credits(a_branch.repository, last_rev)
+            display_credits(credits)
+        finally:
+            a_branch.unlock()
+
+
+commands.register_command(cmd_credits)
 
 
 def test_suite():
     from unittest import TestSuite
     from bzrlib.tests import TestLoader
-    import test_stats
     suite = TestSuite()
     loader = TestLoader()
-    testmod_names = ['test_stats']
+    testmod_names = [ 'test_classify']
     suite.addTest(loader.loadTestsFromModuleNames(['%s.%s' % (__name__, i) for i in testmod_names]))
     return suite
 

=== added file 'classify.py'
--- a/classify.py	1970-01-01 00:00:00 +0000
+++ b/classify.py	2008-06-28 20:07:45 +0000
@@ -0,0 +1,48 @@
+"""Classify a commit based on the types of files it changed."""
+
+from bzrlib import urlutils 
+from bzrlib.trace import mutter
+
+
+def classify_filename(name):
+    """Classify a file based on its name.
+    
+    :param name: File path.
+    :return: One of code, documentation, translation or art. 
+        None if determining the file type failed.
+    """
+    # FIXME: Use mime types? Ohcount? 
+    basename = urlutils.basename(name)
+    try:
+        extension = basename.split(".")[1]
+        if extension in ("c", "h", "py", "cpp", "rb", "ac"):
+            return "code"
+        if extension in ("html", "xml", "txt", "rst", "TODO"):
+            return "documentation"
+        if extension in ("po"):
+            return "translation"
+        if extension in ("svg", "png", "jpg"):
+            return "art"
+    except IndexError:
+        if basename in ("README", "NEWS", "TODO", 
+                        "AUTHORS", "COPYING"):
+            return "documentation"
+        if basename in ("Makefile"):
+            return "code"
+
+    mutter("don't know how to classify %s", name)
+    return None
+
+
+def classify_delta(delta):
+    """Determine what sort of changes a delta contains.
+
+    :param delta: A TreeDelta to inspect
+    :return: List with classes found (see classify_filename)
+    """
+    # TODO: This is inaccurate, since it doesn't look at the 
+    # number of lines changed in a file.
+    types = []
+    for d in delta.added + delta.modified:
+        types.append(classify_filename(d[0]))
+    return types

=== modified file 'debian/changelog'
--- a/debian/changelog	2008-07-03 14:42:20 +0000
+++ b/debian/changelog	2008-07-16 17:31:20 +0000
@@ -1,4 +1,4 @@
-bzr-stats (0.0.1~bzr20-1) unstable; urgency=low
+bzr-stats (0.0.1~bzr23-1) unstable; urgency=low
 
   * Initial release. (Closes: #XXXXXX)
 

=== modified file 'setup.py'
--- a/setup.py	2007-10-26 02:33:18 +0000
+++ b/setup.py	2007-12-09 04:21:27 +0000
@@ -8,6 +8,8 @@
       version='0.0.1',
       license='GPL',
       author='John Arbash Meinel',
+      author_email="john at arbash-meinel.com",
+      url="http://launchpad.net/bzr-stats",
       long_description="""
       Simple statistics plugin for Bazaar.
       """,

=== added file 'test_classify.py'
--- a/test_classify.py	1970-01-01 00:00:00 +0000
+++ b/test_classify.py	2008-06-28 18:54:37 +0000
@@ -0,0 +1,22 @@
+from bzrlib.tests import TestCase
+from bzrlib.plugins.stats.classify import classify_filename, classify_delta
+
+
+class TestClassify(TestCase):
+    def test_classify_code(self):
+        self.assertEquals("code", classify_filename("foo/bar.c"))
+
+    def test_classify_documentation(self):
+        self.assertEquals("documentation", classify_filename("bla.html"))
+
+    def test_classify_translation(self):
+        self.assertEquals("translation", classify_filename("nl.po"))
+
+    def test_classify_art(self):
+        self.assertEquals("art", classify_filename("icon.png"))
+
+    def test_classify_unknown(self):
+        self.assertEquals(None, classify_filename("something.bar"))
+
+    def test_classify_doc_hardcoded(self):
+        self.assertEquals("documentation", classify_filename("README"))

=== removed file 'test_stats.py'
--- a/test_stats.py	2007-07-17 16:17:40 +0000
+++ b/test_stats.py	1970-01-01 00:00:00 +0000
@@ -1,17 +0,0 @@
-from bzrlib.tests import TestCase
-from bzrlib.plugins.stats import extract_fullname
-
-
-class TestFullnameExtractor(TestCase):
-    def test_standard(self):
-        self.assertEquals("John Doe", 
-            extract_fullname("John Doe <joe at example.com>"))
-
-    def test_only_email(self):
-        self.assertEquals("",
-            extract_fullname("joe at example.com"))
-
-    def test_only_fullname(self):
-        self.assertEquals("John Doe",
-            extract_fullname("John Doe"))
-