New on-disk index

Enrico Zini enrico at enricozini.org
Tue Feb 21 15:33:17 UTC 2006


On Tue, Feb 21, 2006 at 03:15:15PM +0100, Benjamin Mesing wrote:

> looks pretty cool. I'm looking forward to seeing the performance
> improvments in the end user applications :-)

Yeah, me too!  There should be an update of libapt-front soonish, once
we finish implementing the different way of running debtags update.


> > output                        18910ms                 7740ms        50ms
> What does this output? Only the package names and the tags, or also the
> package information?

It outputs the entire contents of the collection (item names and their
tags).  Those numbers are very high, and that's because the benchmarks
calls that function 500 times, and there are 10000 items in the
collection, which is quite a lot.  The high numbers compared to
TDBIndexer are (IMO) overhead due to converting int IDs to strings; but
that function isn't the kind of function you'd call in a very inner
loop, so I wouldn't bother optimizing it too much.

> > outputHavingTags               1330ms                   90ms       200ms
> What does this output? Packages having a given set of tags?

Yes: like output, but only output those items who have at least all the
tags in a given set.

> Which tags did you choose for your tests? The selection of tags might
> influence the test results.

I create a random dataset through questionable use of possibly badly
implemented statistical distributions, but I make it deterministic by
calling srand with a fixed value at the beginning of the benchmark.

The idea is to generate a distribution where few items have lots of tags
and normally items have 5 or 6 tags, and where few tags are really
popular and normally tags are used every now and then.  But I'm dead
rusty with simulation code, and I wouldn't mind someone giving a look at
that code
(svn+ssh://svn.debian.org/svn/debtags/tagcoll/trunk/bench/collection.cc)


Ciao,

Enrico

-- 
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico at debian.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/debtags-devel/attachments/20060221/89d2f61a/attachment.pgp


More information about the Debtags-devel mailing list