[Pkg-bazaar-commits] ./bzr/unstable r321: doc: revfile storage and related things

Martin Pool mbp at sourcefrog.net
Fri Apr 10 07:51:50 UTC 2009


------------------------------------------------------------
revno: 321
committer: Martin Pool <mbp at sourcefrog.net>
timestamp: Mon 2005-05-02 17:20:35 +1000
message:
  doc: revfile storage and related things
added:
  doc/revfile.txt
modified:
  TODO
  doc/index.txt
-------------- next part --------------
=== modified file 'TODO'
--- a/TODO	2005-04-29 03:32:40 +0000
+++ b/TODO	2005-05-02 07:20:35 +0000
@@ -1,4 +1,10 @@
-                                                 -*- indented-text -*-
+.. -*- mode: rst; compile-command: "rest2html TODO >doc/todo.html" -*- 
+
+
+*******************
+Things to do in bzr
+*******************
+
 
 See also various low-level TODOs in the source code.  Try looking in
 the list archive or on gmane.org for previous discussion of these
@@ -133,10 +139,6 @@
 
   - perhaps a pattern that matches only directories or non-directories
 
-* Expansion of $Id$ keywords within working files.  Perhaps do this in
-  exports first as a simpler case because then we don't need to deal
-  with removing the tags on the way back in.
-
 * Consider using Python logging library as well as/instead of
   bzrlib.trace.
 
@@ -149,7 +151,7 @@
   Consider using ZopeInterface definitions for the external interface;
   I think these are already used in PyBaz.  They allow automatic
   checking of the interface but may be unfamiliar to general Python
-  developers.
+  developers, so I'm not really keen.
 
 * Commands to dump out all command help into a manpage or HTML file or
   whatever.
@@ -158,6 +160,24 @@
 Large things
 ------------
 
+* Generate annotations from current file relative to previous
+  annotations.
+
+  - Is it necessary to store any kind of annotation where data was
+    deleted?
+
+* Update revfile format and make it active:
+
+  - Texts should be identified by something keyed on the revision, not
+    an individual text-id.  This is much more useful for annotate I
+    think; we want to map back to the revision that last changed it.
+
+  - Access revfile revisions through the Tree/Store classes.
+
+  - Check them from check commands.
+
+  - Store annotations.
+
 * Hooks for pre-commit, post-commit, etc.
 
   Consider the security implications; probably should not enable hooks
@@ -174,3 +194,8 @@
 * GUI (maybe in Python GTK+?)
 
 * C library interface
+
+* Expansion of $Id$ keywords within working files.  Perhaps do this in
+  exports first as a simpler case because then we don't need to deal
+  with removing the tags on the way back in.
+

=== modified file 'doc/index.txt'
--- a/doc/index.txt	2005-04-26 05:20:17 +0000
+++ b/doc/index.txt	2005-05-02 07:20:35 +0000
@@ -107,6 +107,8 @@
 
 * `Patch pools <pool.html>`__ to efficiently store related branches.
 
+* `Revfiles <revfile.html>`__ store the text history of files.
+
 * `Revision syntax <revision-syntax.html>`__ -- ``hello.c at 12``, etc.
 
 * `Roll-up commits <rollup.html>`__ -- a single revision incorporates

=== added file 'doc/revfile.txt'
--- a/doc/revfile.txt	1970-01-01 00:00:00 +0000
+++ b/doc/revfile.txt	2005-05-02 07:20:35 +0000
@@ -0,0 +1,100 @@
+********
+Revfiles
+********
+
+The unit for compressed storage in bzr is a *revfile*, whose design
+was suggested by Matt Mackall.
+
+
+Requirements
+============
+
+Compressed storage is a tradeoff between several goals:
+
+* Reasonably compact storage of long histories.
+
+* Robustness and simplicity.
+
+* Fast extraction of versions and addition of new versions (preferably
+  without rewriting the whole file, or reading the whole history.)
+
+* Fast and precise annotations.
+
+* Storage of files of at least a few hundred MB.
+
+
+Design
+======
+
+revfiles store the history of a single logical file, which is
+identified in bzr by its file-id.  In this sense they are similar to
+an RCS or CVS ``,v`` file or an SCCS sfile.
+
+Each state of the file is called a *text*. 
+
+Renaming, adding and deleting this file is handled at a higher level
+by the inventory system, and is outside the scope of the revfile.  The
+revfile name is typically based on the file id which is itself
+typically based on the name the file had when it was first added.  But
+this is purely cosmetic.
+
+    For example a file now called ``frob.c`` may have the id
+    ``frobber.c-12873`` because it was originally called
+    ``frobber.c``.  Its texts are kept in the revfile
+    ``.bzr/revfiles/frobber.c-12873.revs``.
+
+When the file is deleted from the inventory the revfile does not
+change.  It's just not used in reproducing trees from that point
+onwards.
+
+The revfile does not record the date when the text was added, a commit
+message, properties, or any other metadata.  That is handled in the
+higher-level revision history.
+
+Inventories and other metadata files that vary from one version to the
+next can themselves be stored in revfiles.
+
+revfiles store files as simple byte streams, with no consideration of
+translating character sets, line endings, or keywords.  Those are also
+handled at a higher level.  However, the revfile may make use of
+knowledge that a file is line-based in generating a diff.  
+
+   (The Python builtin difflib is too slow when generating a purely
+   byte-by-byte delta so we always make a line-by-line diff; when this
+   is fixed it may be feasible to use line-by-line diffs for all
+   files.)
+
+Files whose text does not change from one revision to the next are
+stored as just a single text in the revfile.  This can happen even if
+the file was renamed or other properties were changed in the
+inventory. 
+
+
+Skip-deltas
+-----------
+
+Because the basis of a delta does not need to be the text's logical
+predecessor, we can adjust the deltas 
+
+
+Annotations
+-----------
+
+Storing
+
+
+Open issues
+===========
+
+* revfiles use unsigned 32-bit integers both in diffs and the index.
+  This should be more than enough for any reasonable source file but
+  perhaps not enough for large binaries that are frequently committed.
+
+  Perhaps for those files there should be an option to continue to use
+  the text-store.  There is unlikely to be any benefit in holding
+  deltas between them, and deltas will anyhow be hard to calculate. 
+
+* The append-only design does not allow for destroying committed data,
+  as when confidential information is accidentally added.  That could
+  be fixed by creating the fixed repository as a separate branch, into
+  which only the preserved revisions are exported.



More information about the Pkg-bazaar-commits mailing list