[Collab-qa-commits] r986 - in udd/doc: . sources

neronus-guest at alioth.debian.org neronus-guest at alioth.debian.org
Tue Aug 5 12:12:27 UTC 2008


Author: neronus-guest
Date: 2008-08-05 12:12:27 +0000 (Tue, 05 Aug 2008)
New Revision: 986

Added:
   udd/doc/README
   udd/doc/sources/
   udd/doc/sources/bugs
   udd/doc/sources/packages
   udd/doc/sources/sources
   udd/doc/sources/src-pkg
   udd/doc/sources/testing-migrations
   udd/doc/sources/upload-history
Removed:
   udd/doc/scripts.README
Modified:
   udd/doc/config.README
Log:
Added/edited lots of documentation


Added: udd/doc/README
===================================================================
--- udd/doc/README	                        (rev 0)
+++ udd/doc/README	2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,38 @@
+The Ultimate Debian Database (UDD)
+
+GOAL:
+The UDD project is an effort to collect a part of the data of the Debian
+Project to support Quality Assurance (QA).
+
+CONTEXT AND PEOPLE:
+It started as a Google Summer of Code (GSoC) 2008 project, the student being
+Christian von Essen, who wrote most of the code, mentor being Lucas Nussbaum,
+and Co-Mentors being Stefano Zacchiroli and Marc Brockschmidt.
+
+DESCRIPTION:
+The data we import comes from different sources. Each source has a specific
+type (e.g. popcon). For each such type, there is a program to import this data
+into the database. These programs are called gatherers. Furthermore, there is
+an optional way to update the data (i.e. get it from the source) for each
+source.
+
+The gatherers can be started via src/udd-dispatcher.py, which we call
+dispatcher. The updates can be initiated via src/udd-update.py. Both are
+controlled via a configuration file. See doc/README.config for further
+information.
+
+Each source has its own documentation. See doc/sources/.
+
+The sources are imported into a PostgreSQL database. For the schema,
+see src/setup-db.sql
+
+udd-dispatch.py:
+  udd-dispatch.py <configuration> <source1> [source2 [source3 ...]]
+  This program invokes the gatherers. As first parameter, it accepts
+  a configuration file (see doc/README.config), while the
+  rest of its arguments specify the sources to be gathered.
+
+udd-update.py:
+  udd-update.py: <configuration> <source1> [source2 [source3 ...]]
+  This programs is meant to update the sources. See (doc/README.config)
+  fur further information.

Modified: udd/doc/config.README
===================================================================
--- udd/doc/config.README	2008-08-04 21:31:50 UTC (rev 985)
+++ udd/doc/config.README	2008-08-05 12:12:27 UTC (rev 986)
@@ -1,31 +1,44 @@
 OVERVIEW:
-  The configuration file is in YAML format.
+  The configuration file is in YAML format. See src/test.yaml for an example.
 
-  There are two types of stanzas: general, and source names.
+  There are two types of top nodes:
+    - One with name 'general'
+    - The rest specified sources. Here the name corresponds to the name of the
+      source it describes
 
 GENERAL:
   In the 'general' part, you specify:
     - dbname: The name of the database you will access
     - debug: 0 if you want no debug output, 1 otherwise
-    - archs: The list of possible architectures we're going to handle
     - types:
-      The list of possible source types. This consists of a mapping
-      from source names, to commands. These commands will be executed
-      when udd-dispatch.py is told to gather the matching type of source.
-      The command-line executed consists of the string specified for this
-      source type, with the configuration file path and the source name
-      appended
+      This subtree specified the possible source types. Each sub-node names a
+      type, and contains exactly one sub-node. This sub-node is a string,
+      either beginning with 'exec' or 'module'. To gather a source, the
+      dispatcher look into the configuraiton file for the source's type, and
+      then fetches the string specifed in the according sub-node of
+      general->types. If this string begins with 'exec', the rest of the string
+      is excuted as a command, with the path of the configuration file and the
+      name of the source appended to the command line. If this string begins
+      with 'module', the dispatcher assumes, that the rest of the line names a
+      module, which can be imported into python. The dispatcher expects the
+      module to provide function called get_gatherer, which should return an
+      object behaving like the class in src/udd/gatherer.py suggests.
+    - update-timestamps-folder:
+      If specified, each time a source is successfully updated via
+      src/udd-update.py, a file named like the source is created/modified in
+      the folder specified by this stanza, containing the date and time.
 
 SOURCES:
-  As said before, all other parts are sources.
-  For each source, you have to specify a type, which has to have
-  a corresponding entry in general->types. The rest of the entries depend on
-  the type of the source (see Below)
-
-SOURCE TYPE packages:
-  Each of these sources correspond to a directory in the "dists" directory
-  of a Packages mirror. Required specification are:
-  * archs: List of architectures you want to read
-  * directory: The directory of the release you want to include
-  * parts: The parts you want to include
-  * distribution: The name of the distribution
+  Each such sub-tree whose top node is not called 'general', represents a
+  source. The source's name is specified by the name of the top node.  The
+  sub-nodes of that tree specify the configuration of that source.  All
+  sources have to have a 'type' node, specifying the type of the source.  The
+  type has to have an corresponding entry in general->types. See the GENERAL
+  section.
+  Each source can have an 'update-command' node. If the name of a source is
+  passed as an argument to src/udd-update.py, and the sub-tree of that source
+  contains a node called 'update-command', the sub-node of that node is
+  executed with the system command, and is expected to fetch the data of that
+  source.
+  Other than that, each source type, has its own format, see doc/sources/ for the
+  format of each source type.

Deleted: udd/doc/scripts.README
===================================================================
--- udd/doc/scripts.README	2008-08-04 21:31:50 UTC (rev 985)
+++ udd/doc/scripts.README	2008-08-05 12:12:27 UTC (rev 986)
@@ -1,13 +0,0 @@
-There will be the following scripts:
- - One script to setup the database from an empty database.
-   This script has to create the tables and fill in some
-   source-independent values
- - One script to dispatch the other scripts, based on the source
-   to gather
- - For every source type a gatherer
-
-All these scripts accept as their first argument the path of the configuration file.
-The setup script doesn't accept any other arguments.
-The dispatch script accepts as arguments names of sources to gather
-The gatherer scripts accept exactly two arguments: The path of the configuration
-file and the name of the source to gather

Added: udd/doc/sources/bugs
===================================================================
--- udd/doc/sources/bugs	                        (rev 0)
+++ udd/doc/sources/bugs	2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,27 @@
+DESCRIPTION:
+  The bugs source type imports the data from bugs.debian.org. For this, the
+  perl module Debbugs is used.
+  We device bugs into two parts: On the one hand, there are unarchived bugs,
+  and on the other hand, there are archived bugs.
+  For each bug, we save the following information:
+    id
+    package(s) affected by the bug
+    the source of the package affected by the bug (if existing)
+    the date of the arrival of the bug
+    the bug's severity
+    the submitter
+    the owner
+    the subject of the mail submitting the bug
+    the date the bug has been modified the last time
+    if it affects stable, testing and unstable
+    if it is archived
+    what versions the bug was found and fixed in
+    what tags belong to the bug
+    which bugs the bug was merged with
+
+  Furthermore, for each user found at bugs.debian.org, we save the usertags
+  they have set
+
+CONFIGURATION:
+  archived: Should the gatherer import archived or unarchived bugs?
+  

Added: udd/doc/sources/packages
===================================================================
--- udd/doc/sources/packages	                        (rev 0)
+++ udd/doc/sources/packages	2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,49 @@
+DESCRIPTION:
+  The packages source type handles the Packages.gz files of distributions
+  using the distribution scheme coming from Debian (e.g. Ubuntu).
+
+  If $d is the directory, then the gatherer looks for Packages.gz files
+  in all directories of the form $d/$c/binary-$a/, where $c is an element
+  of the list specified by 'components', and $a is a member of the list
+  specified by 'archs'.
+  If no such directory exists, the gatherer prints a warning message, and
+  ignores the directory. Otherwise, it decompresses the Packages.gz
+  file and imports the information which is found.
+  At the moment, the follwing fields are imported:
+    Package
+    Version
+    Architecture
+    Maintainer
+    Description
+    Source
+    Essential
+    Depends
+    Recommends
+    Suggests
+    Enhances
+    Pre-Depends
+    Installed-Size
+    Homepage
+    Size
+    MD5Sum
+  If a field is found, which is not one of those, a warning is printed.
+  The first 5 fields are considered mandatory, i.e. if one of those
+  fields is missing, an exception is raised, and the program gets aborted.
+  If one of the other fields is missing, their value is NULL in the database,
+  the exceptions being 'source': If this field is missing, the package
+  name is being assumed to be equal to the source name.
+
+TERMINOLOGY:
+  Distribution: The distributing group, e.g. Debian or Ubuntu
+  Release: A release coming from a distribution, e.g. lenny, sid...
+  Component: A part of a release, e.g. main, contrib, non-free, universe,
+             multiverse
+
+CONFIGURATION:
+  directory: The directory denoting the release. Normally a subdirectory of
+             the dists directory of a mirror. Contains the sub-directories of
+	     the components
+  components: The list components to handle. 
+  distribution: The distribution these packages belong to
+  release: The release these packages belong to
+  archs: The list of architectures to handle.

Added: udd/doc/sources/sources
===================================================================
--- udd/doc/sources/sources	                        (rev 0)
+++ udd/doc/sources/sources	2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,56 @@
+DESCRIPTION:
+  The source source type handles Sources.gz files of distribution using the
+  distribution scheme coming from Debian (e.g. Ubuntu).
+
+  If $d is the directory, then the gatherer looks for Sources.gz files
+  in all directories of the form $d/$c/source/, where $c is an element
+  of the list specified by 'components'.
+  If no such directory exists, the gatherer prints a warning message, and
+  ignores the directory. Otherwise, it decompresses the Sources.gz
+  file and imports the information which is found.
+  At the moment, the follwing fields are imported:
+    Format
+    Maintainer
+    Package
+    Version
+    Files
+    Uploaders
+    Binary
+    Architecture
+    Standards-Version
+    Homepage
+    Build-Depends
+    Build-Depends-Indep
+    Build-Conflicts
+    Build-Conflicts-Indep
+    Priority
+    Section
+    Vcs-Arch
+    Vcs-Browser
+    Vcs-Bzr
+    Vcs-Cvs
+    Vcs-Darcs
+    Vcs-Git
+    Vcs-Hg
+    Vcs-Svn
+    X-Vcs-Browser
+    X-Vcs-Bzr
+    X-Vcs-Darcs
+    X-Vcs-Svn
+  If a field is found, which is not one of those, a warning is printed.
+  The first 5 fields are considered mandatory, i.e. if one of those
+  fields is missing, an exception is raised, and the program gets aborted.
+  If one of the other fields is missing, their value is NULL in the database.
+
+TERMINOLOGY:
+  Distribution: The distributing group, e.g. Debian or Ubuntu
+  Release: A release coming from a distribution, e.g. lenny, sid...
+  Component: A part of a release, e.g. main, contrib, non-free, universe,
+             multiverse
+CONFIGURATION:
+  directory: The directory denoting the release. Normally a subdirectory of
+             the dists directory of a mirror. Contains the sub-directories of
+	     the components
+  components: The list components to handle. 
+  distribution: The distribution these packages belong to
+  release: The release these packages belong to

Added: udd/doc/sources/src-pkg
===================================================================
--- udd/doc/sources/src-pkg	                        (rev 0)
+++ udd/doc/sources/src-pkg	2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,4 @@
+DESCRIPTION:
+  This gatherer just calls the 'sources' gatherer first, and then the
+  'packages' gatherer next. Note: the configuration for the sources source
+  type is a subset of the configuration of the packages source type.

Added: udd/doc/sources/testing-migrations
===================================================================
--- udd/doc/sources/testing-migrations	                        (rev 0)
+++ udd/doc/sources/testing-migrations	2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,11 @@
+DESCRIPTION:
+  the testing-migrations source type handles the files generated by lucas'
+  script (http://qa.debian.org/~lucas/). This tells us:
+    When a package was last in testing with which version
+    What version a package has un unstable, and when it was first seen there
+    When a package had the same version in package and unstable, and which
+      version that was
+
+CONFIGURATION:
+  path: The path of the file, which contains the data downloaded from lucas
+        site (http://qa.debian.org/~lucas/testing-status.raw)

Added: udd/doc/sources/upload-history
===================================================================
--- udd/doc/sources/upload-history	                        (rev 0)
+++ udd/doc/sources/upload-history	2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,17 @@
+DESCRIPTION:
+  Upload history is, as the name suggests, the history of uploads of packages.
+  We import this information from http://qa.debian.org/~filippo/ddc/. From
+  there we get for each upload:
+    the package
+    the version
+    the date
+    changed_by
+    the maintainer
+    if it was a NMU
+    signed by
+    the key id
+
+CONFIGURATION:
+  path: The path where the files coming from godog's site (see above) are
+        saved.
+




More information about the Collab-qa-commits mailing list