[Pkg-db-devel] Bug#417204: Bug#417204: db4.5_load manual page should recommend sorting the input

Frederik Eaton frederik at a5.repetae.net
Wed Aug 8 06:06:43 UTC 2007


On Tue, Aug 07, 2007 at 07:53:12PM -0400, Clint Adams wrote:
> On Sun, Apr 01, 2007 at 09:09:14PM +0100, Frederik Eaton wrote:
> > I find that when I sort the input (by key, of course) to db4.5_load,
> > it runs about 200 times faster. If the time to do the sorting is
> > included, then the speed-up is closer to 100, but it is still enough
> > of a speed-up that I think the manual page should recommend that users
> > try sorting their input. Also, the resulting database file is about
> > 1/3 smaller.
> 
> Would you care to suggest some verbiage?

I can try:

----------------------------------------------------------------
        The input to db4.5_load must be in the output format specified  by  the
        db4.5_dump utility, utilities, or as specified for the -T below.

+       No sorting is performed by db4.5_load itself, but some database
+       types (such as Btree) perform much more efficiently if
+       operations on similar keys occur together. For these database
+       types, sorting the input to db4.5_load can yield a net 100x
+       speed-up and is usually recommended. For example, if "foo.txt"
+       is a tab-delimited file, it can be loaded into a Btree with
+       (/bin/sh):
+
+            LANG="" sort -t$'\t' -u foo.txt | tr $'\t' $'\n' | \
+                db4.5_load -T -t btree foo.db
 OPTIONS
        -c     Specify  configuration  options ignoring any value they may have
               based on the input.  The command-line format is name=value.  See
----------------------------------------------------------------

I don't know if those 2 lines are POSIX-compliant sh, they're just
basically what I use in my scripts. If you have a more "canonical"
version of the code then I'm interested to see it.

Best,

Frederik




More information about the Pkg-db-devel mailing list