[Collab-qa-commits] r986 - in udd/doc: . sources
neronus-guest at alioth.debian.org
neronus-guest at alioth.debian.org
Tue Aug 5 12:12:27 UTC 2008
Author: neronus-guest
Date: 2008-08-05 12:12:27 +0000 (Tue, 05 Aug 2008)
New Revision: 986
Added:
udd/doc/README
udd/doc/sources/
udd/doc/sources/bugs
udd/doc/sources/packages
udd/doc/sources/sources
udd/doc/sources/src-pkg
udd/doc/sources/testing-migrations
udd/doc/sources/upload-history
Removed:
udd/doc/scripts.README
Modified:
udd/doc/config.README
Log:
Added/edited lots of documentation
Added: udd/doc/README
===================================================================
--- udd/doc/README (rev 0)
+++ udd/doc/README 2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,38 @@
+The Ultimate Debian Database (UDD)
+
+GOAL:
+The UDD project is an effort to collect a part of the data of the Debian
+Project to support Quality Assurance (QA).
+
+CONTEXT AND PEOPLE:
+It started as a Google Summer of Code (GSoC) 2008 project, the student being
+Christian von Essen, who wrote most of the code, mentor being Lucas Nussbaum,
+and Co-Mentors being Stefano Zacchiroli and Marc Brockschmidt.
+
+DESCRIPTION:
+The data we import comes from different sources. Each source has a specific
+type (e.g. popcon). For each such type, there is a program to import this data
+into the database. These programs are called gatherers. Furthermore, there is
+an optional way to update the data (i.e. get it from the source) for each
+source.
+
+The gatherers can be started via src/udd-dispatcher.py, which we call
+dispatcher. The updates can be initiated via src/udd-update.py. Both are
+controlled via a configuration file. See doc/README.config for further
+information.
+
+Each source has its own documentation. See doc/sources/.
+
+The sources are imported into a PostgreSQL database. For the schema,
+see src/setup-db.sql
+
+udd-dispatch.py:
+ udd-dispatch.py <configuration> <source1> [source2 [source3 ...]]
+ This program invokes the gatherers. As first parameter, it accepts
+ a configuration file (see doc/README.config), while the
+ rest of its arguments specify the sources to be gathered.
+
+udd-update.py:
+ udd-update.py: <configuration> <source1> [source2 [source3 ...]]
+ This programs is meant to update the sources. See (doc/README.config)
+ fur further information.
Modified: udd/doc/config.README
===================================================================
--- udd/doc/config.README 2008-08-04 21:31:50 UTC (rev 985)
+++ udd/doc/config.README 2008-08-05 12:12:27 UTC (rev 986)
@@ -1,31 +1,44 @@
OVERVIEW:
- The configuration file is in YAML format.
+ The configuration file is in YAML format. See src/test.yaml for an example.
- There are two types of stanzas: general, and source names.
+ There are two types of top nodes:
+ - One with name 'general'
+ - The rest specified sources. Here the name corresponds to the name of the
+ source it describes
GENERAL:
In the 'general' part, you specify:
- dbname: The name of the database you will access
- debug: 0 if you want no debug output, 1 otherwise
- - archs: The list of possible architectures we're going to handle
- types:
- The list of possible source types. This consists of a mapping
- from source names, to commands. These commands will be executed
- when udd-dispatch.py is told to gather the matching type of source.
- The command-line executed consists of the string specified for this
- source type, with the configuration file path and the source name
- appended
+ This subtree specified the possible source types. Each sub-node names a
+ type, and contains exactly one sub-node. This sub-node is a string,
+ either beginning with 'exec' or 'module'. To gather a source, the
+ dispatcher look into the configuraiton file for the source's type, and
+ then fetches the string specifed in the according sub-node of
+ general->types. If this string begins with 'exec', the rest of the string
+ is excuted as a command, with the path of the configuration file and the
+ name of the source appended to the command line. If this string begins
+ with 'module', the dispatcher assumes, that the rest of the line names a
+ module, which can be imported into python. The dispatcher expects the
+ module to provide function called get_gatherer, which should return an
+ object behaving like the class in src/udd/gatherer.py suggests.
+ - update-timestamps-folder:
+ If specified, each time a source is successfully updated via
+ src/udd-update.py, a file named like the source is created/modified in
+ the folder specified by this stanza, containing the date and time.
SOURCES:
- As said before, all other parts are sources.
- For each source, you have to specify a type, which has to have
- a corresponding entry in general->types. The rest of the entries depend on
- the type of the source (see Below)
-
-SOURCE TYPE packages:
- Each of these sources correspond to a directory in the "dists" directory
- of a Packages mirror. Required specification are:
- * archs: List of architectures you want to read
- * directory: The directory of the release you want to include
- * parts: The parts you want to include
- * distribution: The name of the distribution
+ Each such sub-tree whose top node is not called 'general', represents a
+ source. The source's name is specified by the name of the top node. The
+ sub-nodes of that tree specify the configuration of that source. All
+ sources have to have a 'type' node, specifying the type of the source. The
+ type has to have an corresponding entry in general->types. See the GENERAL
+ section.
+ Each source can have an 'update-command' node. If the name of a source is
+ passed as an argument to src/udd-update.py, and the sub-tree of that source
+ contains a node called 'update-command', the sub-node of that node is
+ executed with the system command, and is expected to fetch the data of that
+ source.
+ Other than that, each source type, has its own format, see doc/sources/ for the
+ format of each source type.
Deleted: udd/doc/scripts.README
===================================================================
--- udd/doc/scripts.README 2008-08-04 21:31:50 UTC (rev 985)
+++ udd/doc/scripts.README 2008-08-05 12:12:27 UTC (rev 986)
@@ -1,13 +0,0 @@
-There will be the following scripts:
- - One script to setup the database from an empty database.
- This script has to create the tables and fill in some
- source-independent values
- - One script to dispatch the other scripts, based on the source
- to gather
- - For every source type a gatherer
-
-All these scripts accept as their first argument the path of the configuration file.
-The setup script doesn't accept any other arguments.
-The dispatch script accepts as arguments names of sources to gather
-The gatherer scripts accept exactly two arguments: The path of the configuration
-file and the name of the source to gather
Added: udd/doc/sources/bugs
===================================================================
--- udd/doc/sources/bugs (rev 0)
+++ udd/doc/sources/bugs 2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,27 @@
+DESCRIPTION:
+ The bugs source type imports the data from bugs.debian.org. For this, the
+ perl module Debbugs is used.
+ We device bugs into two parts: On the one hand, there are unarchived bugs,
+ and on the other hand, there are archived bugs.
+ For each bug, we save the following information:
+ id
+ package(s) affected by the bug
+ the source of the package affected by the bug (if existing)
+ the date of the arrival of the bug
+ the bug's severity
+ the submitter
+ the owner
+ the subject of the mail submitting the bug
+ the date the bug has been modified the last time
+ if it affects stable, testing and unstable
+ if it is archived
+ what versions the bug was found and fixed in
+ what tags belong to the bug
+ which bugs the bug was merged with
+
+ Furthermore, for each user found at bugs.debian.org, we save the usertags
+ they have set
+
+CONFIGURATION:
+ archived: Should the gatherer import archived or unarchived bugs?
+
Added: udd/doc/sources/packages
===================================================================
--- udd/doc/sources/packages (rev 0)
+++ udd/doc/sources/packages 2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,49 @@
+DESCRIPTION:
+ The packages source type handles the Packages.gz files of distributions
+ using the distribution scheme coming from Debian (e.g. Ubuntu).
+
+ If $d is the directory, then the gatherer looks for Packages.gz files
+ in all directories of the form $d/$c/binary-$a/, where $c is an element
+ of the list specified by 'components', and $a is a member of the list
+ specified by 'archs'.
+ If no such directory exists, the gatherer prints a warning message, and
+ ignores the directory. Otherwise, it decompresses the Packages.gz
+ file and imports the information which is found.
+ At the moment, the follwing fields are imported:
+ Package
+ Version
+ Architecture
+ Maintainer
+ Description
+ Source
+ Essential
+ Depends
+ Recommends
+ Suggests
+ Enhances
+ Pre-Depends
+ Installed-Size
+ Homepage
+ Size
+ MD5Sum
+ If a field is found, which is not one of those, a warning is printed.
+ The first 5 fields are considered mandatory, i.e. if one of those
+ fields is missing, an exception is raised, and the program gets aborted.
+ If one of the other fields is missing, their value is NULL in the database,
+ the exceptions being 'source': If this field is missing, the package
+ name is being assumed to be equal to the source name.
+
+TERMINOLOGY:
+ Distribution: The distributing group, e.g. Debian or Ubuntu
+ Release: A release coming from a distribution, e.g. lenny, sid...
+ Component: A part of a release, e.g. main, contrib, non-free, universe,
+ multiverse
+
+CONFIGURATION:
+ directory: The directory denoting the release. Normally a subdirectory of
+ the dists directory of a mirror. Contains the sub-directories of
+ the components
+ components: The list components to handle.
+ distribution: The distribution these packages belong to
+ release: The release these packages belong to
+ archs: The list of architectures to handle.
Added: udd/doc/sources/sources
===================================================================
--- udd/doc/sources/sources (rev 0)
+++ udd/doc/sources/sources 2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,56 @@
+DESCRIPTION:
+ The source source type handles Sources.gz files of distribution using the
+ distribution scheme coming from Debian (e.g. Ubuntu).
+
+ If $d is the directory, then the gatherer looks for Sources.gz files
+ in all directories of the form $d/$c/source/, where $c is an element
+ of the list specified by 'components'.
+ If no such directory exists, the gatherer prints a warning message, and
+ ignores the directory. Otherwise, it decompresses the Sources.gz
+ file and imports the information which is found.
+ At the moment, the follwing fields are imported:
+ Format
+ Maintainer
+ Package
+ Version
+ Files
+ Uploaders
+ Binary
+ Architecture
+ Standards-Version
+ Homepage
+ Build-Depends
+ Build-Depends-Indep
+ Build-Conflicts
+ Build-Conflicts-Indep
+ Priority
+ Section
+ Vcs-Arch
+ Vcs-Browser
+ Vcs-Bzr
+ Vcs-Cvs
+ Vcs-Darcs
+ Vcs-Git
+ Vcs-Hg
+ Vcs-Svn
+ X-Vcs-Browser
+ X-Vcs-Bzr
+ X-Vcs-Darcs
+ X-Vcs-Svn
+ If a field is found, which is not one of those, a warning is printed.
+ The first 5 fields are considered mandatory, i.e. if one of those
+ fields is missing, an exception is raised, and the program gets aborted.
+ If one of the other fields is missing, their value is NULL in the database.
+
+TERMINOLOGY:
+ Distribution: The distributing group, e.g. Debian or Ubuntu
+ Release: A release coming from a distribution, e.g. lenny, sid...
+ Component: A part of a release, e.g. main, contrib, non-free, universe,
+ multiverse
+CONFIGURATION:
+ directory: The directory denoting the release. Normally a subdirectory of
+ the dists directory of a mirror. Contains the sub-directories of
+ the components
+ components: The list components to handle.
+ distribution: The distribution these packages belong to
+ release: The release these packages belong to
Added: udd/doc/sources/src-pkg
===================================================================
--- udd/doc/sources/src-pkg (rev 0)
+++ udd/doc/sources/src-pkg 2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,4 @@
+DESCRIPTION:
+ This gatherer just calls the 'sources' gatherer first, and then the
+ 'packages' gatherer next. Note: the configuration for the sources source
+ type is a subset of the configuration of the packages source type.
Added: udd/doc/sources/testing-migrations
===================================================================
--- udd/doc/sources/testing-migrations (rev 0)
+++ udd/doc/sources/testing-migrations 2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,11 @@
+DESCRIPTION:
+ the testing-migrations source type handles the files generated by lucas'
+ script (http://qa.debian.org/~lucas/). This tells us:
+ When a package was last in testing with which version
+ What version a package has un unstable, and when it was first seen there
+ When a package had the same version in package and unstable, and which
+ version that was
+
+CONFIGURATION:
+ path: The path of the file, which contains the data downloaded from lucas
+ site (http://qa.debian.org/~lucas/testing-status.raw)
Added: udd/doc/sources/upload-history
===================================================================
--- udd/doc/sources/upload-history (rev 0)
+++ udd/doc/sources/upload-history 2008-08-05 12:12:27 UTC (rev 986)
@@ -0,0 +1,17 @@
+DESCRIPTION:
+ Upload history is, as the name suggests, the history of uploads of packages.
+ We import this information from http://qa.debian.org/~filippo/ddc/. From
+ there we get for each upload:
+ the package
+ the version
+ the date
+ changed_by
+ the maintainer
+ if it was a NMU
+ signed by
+ the key id
+
+CONFIGURATION:
+ path: The path where the files coming from godog's site (see above) are
+ saved.
+
More information about the Collab-qa-commits
mailing list