[Shootout-list] Binary trees

Jon Harrop jon@ffconsultancy.com
Sun, 19 Jun 2005 17:33:56 +0100


I've just had a look at binarytrees and it is a great example of what I've 
been talking about. The spec is simple enough: create and manipulate some 
(prebalanced) binary trees.

However, the chosen trees and manipulations yield such trivial results that 
all of the implementations can be optimised anywhere from completely naive to 
literally printing out the result (without having to do anything with trees):

stretch tree of depth 17         check: -1
131072   trees of depth 4        check: -131072
32768    trees of depth 6        check: -32768
8192     trees of depth 8        check: -8192
2048     trees of depth 10       check: -2048
512      trees of depth 12       check: -512
128      trees of depth 14       check: -128
32       trees of depth 16       check: -32
long lived tree of depth 16      check: -1

I have just observed that the slightest alterations to the implementations 
produce enormous (many times) changes in performance.

For example, I've written the simplest possible C++ implementation which uses 
the following code to represent a tree:

struct T {
  virtual ~T() {};
  virtual int check() const = 0;
};

struct Empty : public T {
  virtual int check() const { return 0; }
};

struct Node : public T {
  T *l, *r;
  int i;
  Node(T *l2, int i2, T *r2) : l(l2), i(i2), r(r2) {};
  virtual ~Node() { delete l; delete r; }
  virtual int check() const { return l->check() + i - r->check(); }
};

This is not totally unlike a real program. In particular, this tree can be 
empty (containing only one T object, Empty). The other implementations (e.g. 
OCaml, MLton, Haskell) can't represent an empty tree. Does this make them 
invalid?

A quick profile of the above code shows that 50% of the time is spent in 
Node::~Node(), deallocating non-leaf nodes of the trees.

The obvious way to optimise this, and one which is commonly used in practice, 
is to cut a level of leaves off the trees, i.e. special case the subtree: 
"Node(Empty, i, Empty)". This is exactly what the other implementations have 
already done and is precisely why the C++ is half as fast as them. Look at 
Christophe's OCaml:

  type int_tree = Empty of int | Node of int_tree * int * int_tree

It can't represent an empty tree and what was "Node(Empty, i, Empty)" is now 
called "Empty(1)" (which is a misnoma because an empty node shouldn't contain 
anything!).

The obvious way to optimise this is to chop off even more leaves. Everytime we 
chop off a level of leaves we alleviate the GC from _half_ of its work! So we 
get this:

  type t = Node1 of int | Node3 of int * int * int | Node of t * int * t

and the program takes 4s to run instead of 7.5s.

Now, this begs the question how far can I take this optimisation before Isaac 
starts whining? I can hear Isaac saying that I must mimic the existing 
implementations (i.e. use their broken representation of a tree).

Ok then. I'll optimise it by replacing all the "new" and "delete"s in the C++ 
with a single pair and allocate nodes myself from a preallocated array of 
nodes. That'll be many times faster and there is still a tree.

Or how about this, I replace the tree data structure in OCaml with a recursive 
closure? That won't even need to build the tree, there will be nothing to GC 
and it will probably be orders of magnitude faster as a consequence. Even 
better, it's subtle, and Isaac probably won't notice. ;-)

For me, the propect of having Isaac throw a submission back in my face because 
of a rule he just imagined is the strongest reason to not join in development 
of the shootout. If people were free to submit their own crazy optimisations 
in their favourite language then I think the shootout would be much more 
popular (just look at the software texture mapping competitions in days of 
old).

Also, these problems with binarytrees will disappear if we follow Skaller's 
advise for designing the tests. This test shouldn't state that you must use a 
binary tree, it should simply require you to solve a problem that is best 
attacked with a binary tree.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists