[Shootout-list] Science-related benchmarks

Jon Harrop jon@ffconsultancy.com
Mon, 25 Apr 2005 13:47:29 +0100


On Monday 25 April 2005 11:06, Sebastien Loisel wrote:
> > 1. Get rid of the magic numbers (e.g. 0.8888... / 2.).
>
> You could replace them with formulae for the Chebyshev nodes and weights.

Yes, exactly.

> > 3. Don't use macros in C++.
>
> They're part of the language.

That doesn't make it good style.

> > Both programs and Mathematica, I get "8.787608...". The C++ took 4.366s
> > and the OCaml took 5.942s.
>
> According to Xavier Leroy, he thinks the ocaml in the original spends
> most of its time in the heap code. I think what you've done is reduced
> that penalty by increasing the amount of floating point calculations
> that need to be done, and ocaml is smart enough to unbox the floating
> point calculations because they're all in a single function. Is that
> right?

I made several other optimisations as well. They don't buy you much. More 
importantly, the ratio of the times for the C++ and OCaml versions vary a lot 
depending upon the integrand. I haven't found a single example where OCaml is 
faster than C++ though.

> > The FFTW-based MEM program from my book is of practical interest and, I
> > believe, is representative of many other tasks performed by scientists.
>
> I'd have to see the code.

It's freely available:

  http://www.ffconsultancy.com/products/ocaml_for_scientists/complete/

> > > > 1. Sphere: Recursively subdivide an icosahedron to produce an
> > > > arbitrarily
> >
> > I think you're thinking of uniform subdivision. I'm thinking of adaptive
>
> I was thinking of both. For adaptive subdivision I use map<> data
> structures, which will make Fortran, C etc... a pain. My Ph.D. is
> about PDEs on the sphere, so I use this too.

Yes, it isn't array+floats. If I can find the time then I'll knock up an OCaml 
implementation.

> > Some languages may be able to represent the lists (and their tails) as
> > arrays
>
> Which languages?

In the shootout? No idea.

> > I'm not sure what you mean by "building the multiresolution thing".
>
> http://www.eso.org/projects/esomidas/doc/user/98NOV/volb/node316.html

I'd rather see a really simple function that people can work through to 
understand what the wavelet transform is essentially doing.

Here's the list-based 1D Haar wavelet transform for integer-power-of-two 
length input from my book:

let haar l =
  let rec aux l s d = match l, s, d with
    [s], [], d -> s :: d
  | [], s, d -> aux s [] d
  | h1::h2::t, s, d -> aux t (h1 +. h2 :: s) (h1 -. h2 :: d)
  | _ -> invalid_arg "haar" in
  aux l [] []

I think that is suitably simple and interesting. Note that lists are only 
decapitated and prepended to, showing that the memory access for an 
array-based implementation could be contiguous.

You could regard this as a floating-point intensive benchmark.

> > Yes. What about raytracing the old sphere-flake? Something like this:
>
> Do you build a special sphere-flake-ray test that's efficient, or do
> you stick one million spheres in a general purpose hierarchical
> bounding volume? The latter seems more interesting for the benchmark,
> but I suspect it'll be large loc.

Instead of using hierarchical bounding volumes, you could compute the sets of 
spheres which intersect horizontal and vertical planes. Then you can restrict 
the number of sphere-ray intersections for each pixel to the intersection of 
the spheres on the horizontal with the spheres on the vertical.

That should hit the right balance between computational intensity and code 
size.

IMHO, this should be a "same thing" benchmark.

> > > > 7. Symbolic manipulation: write a program which can symbolically
> >
> > Yes, C++ won't be very good at this problem either. Efficient pattern
> > matching
>
> I'm not convinced that C++ will be less efficient. It will be more
> tedious to write though.

Yes, you'll need to write a lot of C++ to get the job done. Amazingly, some 
people do use C++ for this kind of problem though...

-----

Here's my improved C++ version of your integrate benchmark:

#include <cmath>
#include <cstdio>
#include <vector>
#include <algorithm>

using namespace std;

struct intbits {
  double a, b, val, err;
  intbits() : a(0), b(0), val(0), err(0) {}
  intbits(double A, double B, double V, double E) :
    a(A), b(B), val(V), err(E) {}
  bool operator < (const intbits &x) const { return err < x.err; }
};

typedef vector<intbits> intqueue;

double log100 = log(100.);
double f(double x)
{ return sin(x*log(x)) * sin(100.*log100 + (x - 100)*(1 + log100)); }

template <class F>
double gauss5(double a, double b, F &f) {
  double x2 = 0.5*(a + b);
  double x1 = 0.77459666924148337703585307995648*(a - x2) + x2;
  double x3 = 0.77459666924148337703585307995648*(b - x2) + x2;
  return (f(x2)*4./9. + 5./18.*(f(x1) + f(x3)))*(b - a);
}

template <class F>
double gauss7(double a, double b, F &f) {
  double xm = 0.5*(a + b);
  double x1 = 0.86113631159405257522394648889281*(a - xm) + xm,
    x2 = 0.33998104358485626480266575910324*(a - xm) + xm,
    x3 = 0.33998104358485626480266575910324*(b - xm) + xm,
    x4 = 0.86113631159405257522394648889281*(b - xm) + xm;
  return ((0.347854845137453857373063949222/2.0)*(f(x1) + f(x4)) +
   (0.652145154862546142626936050778/2.0)*(f(x2) + f(x3)))*(b - a);
}

template <class F>
intbits integrate_with_error(double eps, double a, double b, F &f) {
  double g7 = gauss7(a, b, f);
  if (b - a < eps/1024.0) return intbits(a, b, g7, 0);
  return intbits(a, b, g7, fabs(gauss5(a, b, f) - g7));
}

void PUSH_QUEUE(intqueue &q, intbits foo)
{ q.push_back(foo); push_heap(q.begin(), q.end()); }

template <class F>
double integrate_once(double eps, double a, double b, F &f, intqueue &q) {
  double x1 = q[0].a, x2 = q[0].b;
  double err = -q[0].err;
  pop_heap(q.begin(), q.end());
  q.pop_back();
  intbits foo = integrate_with_error(eps, x1, 0.5*(x1 + x2), f);
  err += foo.err;
  PUSH_QUEUE(q, foo);
  foo = integrate_with_error(eps, 0.5*(x1 + x2), x2, f);
  err += foo.err;
  PUSH_QUEUE(q, foo);
  return err;
}

template <class F>
double adaptive_integrate(double eps, double a, double b, F &f) {
  intqueue q(0);
  intbits foo = integrate_with_error(eps, a, b, f);
  PUSH_QUEUE(q, foo);
  while (q[0].err > eps) integrate_once(eps, a, b, f, q);
  double err = q[0].err;
  for (int i = 1; i < q.size(); ++i) err += q[i].err;
  while (err > eps) err += integrate_once(eps, a, b, f, q);
  double ret = 0;
  double err2 = 0;
  for (int i = 0; i < q.size(); ++i)
    { ret += q[i].val; err2 += q[i].err; }
  return ret;
}

int main() {
  double foo = adaptive_integrate(0.000001, 1, 10000, f);
  printf("%.16f\n",foo);
  return 0;
}

-----

Here's my OCaml too:

type intbits = { a: float; b: float; v: float; err: float }

exception EmptyHeap

type t = { mutable size: int; mutable data: intbits array }

let create n = 
  if n <= 0 then invalid_arg "create";
  { size = -n; data = [||] }

let resize h =
  let n = h.size in
  assert (n > 0);
  let n' = 2 * n and d = h.data in
  let d' = Array.create n' d.(0) in
  Array.blit d 0 d' 0 n;
  h.data <- d'

let add h x =
  if h.size < 0 then (h.data <- Array.create (- h.size) x; h.size <- 0);
  let n = h.size in
  if n == Array.length h.data then resize h;
  let d = h.data in
  let rec moveup i =
    let fi = (i - 1) / 2 in
    if i > 0 && d.(fi).err < x.err then (d.(i) <- d.(fi); moveup fi)
    else d.(i) <- x in
  moveup n;
  h.size <- n + 1

let maximum h = if h.size <= 0 then raise EmptyHeap; h.data.(0)

let remove h =
  if h.size <= 0 then raise EmptyHeap;
  let n = h.size - 1 in
  h.size <- n;
  let d = h.data in
  let x = d.(n) in
  let rec movedown i =
    let j = 2 * i + 1 in
    if j < n then
      let j = if j+1 < n && d.(j+1).err > d.(j).err then j+1 else j in
      if d.(j).err > x.err then (d.(i) <- d.(j); movedown j)
      else d.(i) <- x
    else d.(i) <- x in
  movedown 0

let fold f h x0 =
  let n = h.size and d = h.data in
  let rec foldrec x i = if i >= n then x else foldrec (f d.(i) x) (succ i) in
  foldrec x0 0

let gauss5 a b f = 
  let x2 = 0.5 *. (a +. b) in
  let x1 = 0.77459666924148337703585307995648 *. (a -. x2) +. x2 in
  let x3 = 0.77459666924148337703585307995648 *. (b -. x2) +. x2 in
  (f x2 *. 4. /. 9. +. 5. /. 18. *. (f x1 +. f x3)) *. (b -.a)

let cst3 = 0.347854845137453857373063949222 /. 2.0
let cst4 = 0.652145154862546142626936050778 /. 2.0

let gauss7 a b f = 
  let xm = 0.5 *. (a +. b) in
  let x1 = 0.86113631159405257522394648889281 *. (a -. xm) +. xm
  and x2 = 0.33998104358485626480266575910324 *. (a -. xm) +. xm
  and x3 = 0.33998104358485626480266575910324 *. (b -. xm) +. xm
  and x4 = 0.86113631159405257522394648889281 *. (b -. xm) +. xm in
  (cst3  *. (f x1 +. f x4) +. cst4 *. (f x2 +. f x3)) *. (b -. a)

let integrate_with_error eps a b f =
  let g7 = gauss7 a b f in
  if b -.a < eps /. 1024.0 then  {a=a; b=b; v=g7; err=0.0}
  else {a=a; b=b; v=g7; err=abs_float (gauss5 a b f -. g7)}

let integrate_once eps a b f q = 
  let ib = let m = maximum q in remove q; m in
  let mid = 0.5 *. (ib.a +. ib.b) in
  let ib1 = integrate_with_error eps ib.a mid f in
  add q ib1;
  let ib2 = integrate_with_error eps mid ib.b f in
  add q ib2;
  ib1.err +. ib2.err -. ib.err

let sum_err q = fold (fun ib y -> ib.err +. y) q 0.0
let sum_v q = fold (fun ib y -> ib.v +. y) q 0.0

let adaptive_integrate eps a b f = 
  let q = create 524288 in
  add q (integrate_with_error eps a b f);
  while ((maximum q).err > eps) do ignore (integrate_once eps a b f q) done;
  let err = ref (sum_err q) in
  while (!err > eps) do err := !err +. integrate_once eps a b f q done;
  sum_v q

let f =
  let log100 = log 100. in
  fun x -> sin(x *. log x) *. sin(100.*.log100 +. (x -. 100.)*.(1. +. log100))

let () = Printf.printf "%.16f\n" (adaptive_integrate 0.000001 1.0 10000.0 f)

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists