[Shootout-list] Stuff

Mon, 25 Apr 2005 12:44:52 +0100

On Monday 25 April 2005 10:34, Bengt Kleberg wrote:
> Jon Harrop wrote:
> > For example, here is a minimal core of the "nth" program in OCaml:
> >
> >   let rec nth_nn n (i, io) = match n with
> >     0 -> singleton (i, io)
> >   | 1 -> fold (fun (j, jo) s ->
> >       add (j, add_i io jo) s) bonds.(i) empty
> >   | n ->
> >       let pprev = nth_nn (n - 2) (i, io) in
> >       let prev = nth_nn (n - 1) (i, io) in
> >       let aux j t = union (nth_nn 1 j) t in
> >       diff (diff (fold aux prev empty) prev) pprev
>
> would it be possible to explain this program without using ocaml (or any
> other programming languge)?

Sure, NP.

Basically, given a graph (the sets of nearest neighbours of each vertex), you 
want to find the "n"th-nearest neighbours of the "i"th vertex in a graph.
This is most naturally written as a recurrence relation.

You define the 0th neighbour of a vertex "i" to be the set containing only i: 
{i}. The first neighbours of "i" were given in the input so you just look 
them up. The 2nd-nearest neighbours of "i" are the neighbours of the 
neighbours of "i" which are not the neighbours of "i" and which are not "i" 
itself.

In general, the "n"th-nearest neighbours may be written in terms of the 
"n-1"th- and "n-2"th-nearest neighbours: The "n"th-nearest neighbours of "i" 
are the neighbours of the "n-1"th nearest neighbours of "i" without the 
"n-1"th- and "n-2"th-nearest neighbours of "i".

That would lead to code like this:

  let rec nth_nn n i = match n with
    0 -> singleton i
  | 1 -> bonds.(i)
  | n ->
      let pprev = nth_nn (n - 2) i in
      let prev = nth_nn (n - 1) i in
      let aux j t = union (nth_nn 1 j) t in
      diff (diff (fold aux prev empty) prev) pprev

However, there is an additional quirk. Scientists simulate atomic structures 
by "wrapping them around" a cube, known as periodic boundary conditions. This 
means the atoms on one face of the cubic simulation are taken to be bonded to 
the atoms on the opposite face. So the number of atoms simulated is finite i 
= [1..N] but this actually models an infinite number of atoms, tiled to cover 
all of space.

In scientific computing, programmers often gloss over this and their programs 
silently produce erroneous results when they are required to work over 
structures approaching the size of the cube used in the simulation.

This can be attributed to the use of only the index "i" to refer to an atom, 
allowing you to refer only to one of "N" atoms. I get around this caveat by 
using both "i" and the index of the cube that the atom lies in, allowing you 
to refer to any of the infinite number of atoms uniquely. So, the atom 
denoted "(1, (0, 0, 0))" is the first atom in the cube at the origin and "(2, 
(-1, 0, 0))" is the second atom in the cube just to the left of the cube at 
the origin.

For example, if atom "1" is on the left-face of the cube and its neighbouring 
atom "2" is on the right face, I do not simply say that "1" is bonded to "2" 
but, rather, that (1, (0, 0, 0)) is bonded to (2, (-1, 0, 0)).

In the OCaml code, I've used (i, io) to denote the "i"th atom in the cube at 
offset "io". As you can see, the original code is only slightly longer but 
correctly handles this common source of errors.

Scientists studying a wide variety of atomic structures using similar programs 
to look at all sorts of interesting phenomena. These programs are reasonably 
computationally expensive (they often take an hour to run) but a factor of 
two isn't so much of a big deal, writing a correct program in less time than 
it takes to run is more important.

So I think this would make a great benchmark for the shootout because it is a 
new class of problem not seen before on the shootout, it emphasises set union 
(which no other benchmark does AFAIK), it is somewhat computationally 
expensive and it is of practical importance.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists