[Pkg-bazaar-commits] ./bzr/unstable r321: doc: revfile storage and related things
Martin Pool
mbp at sourcefrog.net
Fri Apr 10 07:51:50 UTC 2009
------------------------------------------------------------
revno: 321
committer: Martin Pool <mbp at sourcefrog.net>
timestamp: Mon 2005-05-02 17:20:35 +1000
message:
doc: revfile storage and related things
added:
doc/revfile.txt
modified:
TODO
doc/index.txt
-------------- next part --------------
=== modified file 'TODO'
--- a/TODO 2005-04-29 03:32:40 +0000
+++ b/TODO 2005-05-02 07:20:35 +0000
@@ -1,4 +1,10 @@
- -*- indented-text -*-
+.. -*- mode: rst; compile-command: "rest2html TODO >doc/todo.html" -*-
+
+
+*******************
+Things to do in bzr
+*******************
+
See also various low-level TODOs in the source code. Try looking in
the list archive or on gmane.org for previous discussion of these
@@ -133,10 +139,6 @@
- perhaps a pattern that matches only directories or non-directories
-* Expansion of $Id$ keywords within working files. Perhaps do this in
- exports first as a simpler case because then we don't need to deal
- with removing the tags on the way back in.
-
* Consider using Python logging library as well as/instead of
bzrlib.trace.
@@ -149,7 +151,7 @@
Consider using ZopeInterface definitions for the external interface;
I think these are already used in PyBaz. They allow automatic
checking of the interface but may be unfamiliar to general Python
- developers.
+ developers, so I'm not really keen.
* Commands to dump out all command help into a manpage or HTML file or
whatever.
@@ -158,6 +160,24 @@
Large things
------------
+* Generate annotations from current file relative to previous
+ annotations.
+
+ - Is it necessary to store any kind of annotation where data was
+ deleted?
+
+* Update revfile format and make it active:
+
+ - Texts should be identified by something keyed on the revision, not
+ an individual text-id. This is much more useful for annotate I
+ think; we want to map back to the revision that last changed it.
+
+ - Access revfile revisions through the Tree/Store classes.
+
+ - Check them from check commands.
+
+ - Store annotations.
+
* Hooks for pre-commit, post-commit, etc.
Consider the security implications; probably should not enable hooks
@@ -174,3 +194,8 @@
* GUI (maybe in Python GTK+?)
* C library interface
+
+* Expansion of $Id$ keywords within working files. Perhaps do this in
+ exports first as a simpler case because then we don't need to deal
+ with removing the tags on the way back in.
+
=== modified file 'doc/index.txt'
--- a/doc/index.txt 2005-04-26 05:20:17 +0000
+++ b/doc/index.txt 2005-05-02 07:20:35 +0000
@@ -107,6 +107,8 @@
* `Patch pools <pool.html>`__ to efficiently store related branches.
+* `Revfiles <revfile.html>`__ store the text history of files.
+
* `Revision syntax <revision-syntax.html>`__ -- ``hello.c at 12``, etc.
* `Roll-up commits <rollup.html>`__ -- a single revision incorporates
=== added file 'doc/revfile.txt'
--- a/doc/revfile.txt 1970-01-01 00:00:00 +0000
+++ b/doc/revfile.txt 2005-05-02 07:20:35 +0000
@@ -0,0 +1,100 @@
+********
+Revfiles
+********
+
+The unit for compressed storage in bzr is a *revfile*, whose design
+was suggested by Matt Mackall.
+
+
+Requirements
+============
+
+Compressed storage is a tradeoff between several goals:
+
+* Reasonably compact storage of long histories.
+
+* Robustness and simplicity.
+
+* Fast extraction of versions and addition of new versions (preferably
+ without rewriting the whole file, or reading the whole history.)
+
+* Fast and precise annotations.
+
+* Storage of files of at least a few hundred MB.
+
+
+Design
+======
+
+revfiles store the history of a single logical file, which is
+identified in bzr by its file-id. In this sense they are similar to
+an RCS or CVS ``,v`` file or an SCCS sfile.
+
+Each state of the file is called a *text*.
+
+Renaming, adding and deleting this file is handled at a higher level
+by the inventory system, and is outside the scope of the revfile. The
+revfile name is typically based on the file id which is itself
+typically based on the name the file had when it was first added. But
+this is purely cosmetic.
+
+ For example a file now called ``frob.c`` may have the id
+ ``frobber.c-12873`` because it was originally called
+ ``frobber.c``. Its texts are kept in the revfile
+ ``.bzr/revfiles/frobber.c-12873.revs``.
+
+When the file is deleted from the inventory the revfile does not
+change. It's just not used in reproducing trees from that point
+onwards.
+
+The revfile does not record the date when the text was added, a commit
+message, properties, or any other metadata. That is handled in the
+higher-level revision history.
+
+Inventories and other metadata files that vary from one version to the
+next can themselves be stored in revfiles.
+
+revfiles store files as simple byte streams, with no consideration of
+translating character sets, line endings, or keywords. Those are also
+handled at a higher level. However, the revfile may make use of
+knowledge that a file is line-based in generating a diff.
+
+ (The Python builtin difflib is too slow when generating a purely
+ byte-by-byte delta so we always make a line-by-line diff; when this
+ is fixed it may be feasible to use line-by-line diffs for all
+ files.)
+
+Files whose text does not change from one revision to the next are
+stored as just a single text in the revfile. This can happen even if
+the file was renamed or other properties were changed in the
+inventory.
+
+
+Skip-deltas
+-----------
+
+Because the basis of a delta does not need to be the text's logical
+predecessor, we can adjust the deltas
+
+
+Annotations
+-----------
+
+Storing
+
+
+Open issues
+===========
+
+* revfiles use unsigned 32-bit integers both in diffs and the index.
+ This should be more than enough for any reasonable source file but
+ perhaps not enough for large binaries that are frequently committed.
+
+ Perhaps for those files there should be an option to continue to use
+ the text-store. There is unlikely to be any benefit in holding
+ deltas between them, and deltas will anyhow be hard to calculate.
+
+* The append-only design does not allow for destroying committed data,
+ as when confidential information is accidentally added. That could
+ be fixed by creating the fixed repository as a separate branch, into
+ which only the preserved revisions are exported.
More information about the Pkg-bazaar-commits
mailing list