[Shootout-list] Binary trees
Jon Harrop
jon@ffconsultancy.com
Sun, 19 Jun 2005 17:33:56 +0100
I've just had a look at binarytrees and it is a great example of what I've
been talking about. The spec is simple enough: create and manipulate some
(prebalanced) binary trees.
However, the chosen trees and manipulations yield such trivial results that
all of the implementations can be optimised anywhere from completely naive to
literally printing out the result (without having to do anything with trees):
stretch tree of depth 17 check: -1
131072 trees of depth 4 check: -131072
32768 trees of depth 6 check: -32768
8192 trees of depth 8 check: -8192
2048 trees of depth 10 check: -2048
512 trees of depth 12 check: -512
128 trees of depth 14 check: -128
32 trees of depth 16 check: -32
long lived tree of depth 16 check: -1
I have just observed that the slightest alterations to the implementations
produce enormous (many times) changes in performance.
For example, I've written the simplest possible C++ implementation which uses
the following code to represent a tree:
struct T {
virtual ~T() {};
virtual int check() const = 0;
};
struct Empty : public T {
virtual int check() const { return 0; }
};
struct Node : public T {
T *l, *r;
int i;
Node(T *l2, int i2, T *r2) : l(l2), i(i2), r(r2) {};
virtual ~Node() { delete l; delete r; }
virtual int check() const { return l->check() + i - r->check(); }
};
This is not totally unlike a real program. In particular, this tree can be
empty (containing only one T object, Empty). The other implementations (e.g.
OCaml, MLton, Haskell) can't represent an empty tree. Does this make them
invalid?
A quick profile of the above code shows that 50% of the time is spent in
Node::~Node(), deallocating non-leaf nodes of the trees.
The obvious way to optimise this, and one which is commonly used in practice,
is to cut a level of leaves off the trees, i.e. special case the subtree:
"Node(Empty, i, Empty)". This is exactly what the other implementations have
already done and is precisely why the C++ is half as fast as them. Look at
Christophe's OCaml:
type int_tree = Empty of int | Node of int_tree * int * int_tree
It can't represent an empty tree and what was "Node(Empty, i, Empty)" is now
called "Empty(1)" (which is a misnoma because an empty node shouldn't contain
anything!).
The obvious way to optimise this is to chop off even more leaves. Everytime we
chop off a level of leaves we alleviate the GC from _half_ of its work! So we
get this:
type t = Node1 of int | Node3 of int * int * int | Node of t * int * t
and the program takes 4s to run instead of 7.5s.
Now, this begs the question how far can I take this optimisation before Isaac
starts whining? I can hear Isaac saying that I must mimic the existing
implementations (i.e. use their broken representation of a tree).
Ok then. I'll optimise it by replacing all the "new" and "delete"s in the C++
with a single pair and allocate nodes myself from a preallocated array of
nodes. That'll be many times faster and there is still a tree.
Or how about this, I replace the tree data structure in OCaml with a recursive
closure? That won't even need to build the tree, there will be nothing to GC
and it will probably be orders of magnitude faster as a consequence. Even
better, it's subtle, and Isaac probably won't notice. ;-)
For me, the propect of having Isaac throw a submission back in my face because
of a rule he just imagined is the strongest reason to not join in development
of the shootout. If people were free to submit their own crazy optimisations
in their favourite language then I think the shootout would be much more
popular (just look at the software texture mapping competitions in days of
old).
Also, these problems with binarytrees will disappear if we follow Skaller's
advise for designing the tests. This test shouldn't state that you must use a
binary tree, it should simply require you to solve a problem that is best
attacked with a binary tree.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists