[Pkg-db-devel] Bug#417204: Bug#417204: db4.5_load manual page should recommend sorting the input
Frederik Eaton
frederik at a5.repetae.net
Wed Aug 8 06:06:43 UTC 2007
On Tue, Aug 07, 2007 at 07:53:12PM -0400, Clint Adams wrote:
> On Sun, Apr 01, 2007 at 09:09:14PM +0100, Frederik Eaton wrote:
> > I find that when I sort the input (by key, of course) to db4.5_load,
> > it runs about 200 times faster. If the time to do the sorting is
> > included, then the speed-up is closer to 100, but it is still enough
> > of a speed-up that I think the manual page should recommend that users
> > try sorting their input. Also, the resulting database file is about
> > 1/3 smaller.
>
> Would you care to suggest some verbiage?
I can try:
----------------------------------------------------------------
The input to db4.5_load must be in the output format specified by the
db4.5_dump utility, utilities, or as specified for the -T below.
+ No sorting is performed by db4.5_load itself, but some database
+ types (such as Btree) perform much more efficiently if
+ operations on similar keys occur together. For these database
+ types, sorting the input to db4.5_load can yield a net 100x
+ speed-up and is usually recommended. For example, if "foo.txt"
+ is a tab-delimited file, it can be loaded into a Btree with
+ (/bin/sh):
+
+ LANG="" sort -t$'\t' -u foo.txt | tr $'\t' $'\n' | \
+ db4.5_load -T -t btree foo.db
OPTIONS
-c Specify configuration options ignoring any value they may have
based on the input. The command-line format is name=value. See
----------------------------------------------------------------
I don't know if those 2 lines are POSIX-compliant sh, they're just
basically what I use in my scripts. If you have a more "canonical"
version of the code then I'm interested to see it.
Best,
Frederik
More information about the Pkg-db-devel
mailing list