Revamping PET (maybe)
Damyan Ivanov
dmn at debian.org
Tue Nov 17 08:54:34 UTC 2009
Hi,
I was trying to continue Ryan's work on making PET support multiple
source repositories and as a side effect, allow using Git, but so far
I failed. I looked and looked and failed to come up with small
incremental changes. When ever I take some approach, it resulted in
massive changes all over.
So, I started writing a design document, currently a wish list for
what PET could be like.
It is in doc/architecture.mdwn in ryan52-multirepo branch, but I copy
it here for discussion.
This is what PET looks like in my dreams. As with any dreams, there is
a slight possibility for slipping off reality. :)
----------------8<----------------
General architecture
====================
* PET works with a RepositorySet
* RepositorySet contains Repositories
* RepositorySet holds a Cache.
* The Cache is opened R/W (exclusive lock) or R/O (shared lock) depending
on initialisation.
* Repositories contain Packages.
* Each Repository knows its containing RepositorySet.
* Repositories can access files, directories, branches and tags.
* Packages contain Files and Directories.
* Each Package knows which Repository contains it.
* Packages are populated with data using Collectors.
* Collectors can retrieve data from the cache or other sources (Repository,
BTS, Archive).
Use cases
=========
Displaying data (pet.cgi, cache is R/O)
-----------------------------------
* Works only with data from the Cache
* listing all packages; for each package, the following data is needed:
* name
* versions in repository, Debian (several releases, NEW), upstream
* tags
* packages are shown in groups depending on their classification
Ajax stuff
----------
* pet_chlog.cgi
* Only retrieves the changelog entry (released or unreleased) of a single
package
* pkginfo.cgi
* Retrieves all the info about a single package
Updating the data from repository(ies) (fetch data, cache is R/W)
-----------------------------------------------------------------
Initial data population
-----------------------
* for each repository
* list all packages
* collect information about the package
Subsequent data updates (post- hook)
------------------------------------
* for each change set
* detect affected package(s)
* update package data (only changes)
Collectors
----------
Each collector is responsible for collecting certain class of information about
the package
Collected information:
* Repository stuff:
* watch file: URL and upstream version
* Changelog:
* the last released stanza (text and version)
* signature identity
* the UNRELEASED stanza (if any) (text and version)
* item possible NOTES and other pseudo-tags
* tags
* Debian archive
* versions in different suites
* NEW
* Bug tracking system
* Classification
* uses the collected data to put the package in one of several classes
Cache
-----
The Cache stores information for later re-usage without possible time-consuming
operations.
Cache Interface
---------------
* TODO
Possible implementations
------------------------
Currently we use one big hash streamed with Storable. This is very handy when
operations are to be done on all packages, like in the web frontend.
OTOH, this approach causes the whole file to be rewritten when there is an
update in a single package (post-commit).
5.7M 2009-11-12 09:16 archive
421K 2009-11-12 10:15 bts
2.5M 2009-11-12 10:15 consolidated
2.4M 2009-11-12 07:16 cpan_dists
3.7M 2009-11-12 07:16 cpan_index
4.1M 2009-11-12 04:43 svn
424K 2009-11-12 10:15 watch
Maybe an SQL(ite) database can be used instead? It would also allow processing
to be done package by package, not reading the whole thing in the memory.
The design doesn't care which way of caching is used, as long as it conforms to
the interface.
----------------8<----------------
Please tell me if you see flaws in this approach. Note that nothing is
written yet, so if you think I am losing my time, there is not much
really lost.
--
dam
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/pet-devel/attachments/20091117/acf2e7a9/attachment.pgp>
More information about the PET-devel
mailing list