[Shootout-list] Numerical medium-sized benchmarks

Fri, 25 Mar 2005 02:00:00 -0800

Hello,

When Doug Bagley released the original language shootout it had a
tremendous publicity effect for ocaml. It was featured on slashdot and
suddenly, everybody cool wanted to code in ML. Like everybody else, I
learnt ML, and it's a very good language. In a majority of the Doug's
benchmarks, ocaml was performing on par with C and C++, which was very
impressive. In a few benchmarks, ocaml was even better than C.

Let me give you some background. I am a postdoc in scientific
computing so I write a lot of numerical code. But I've also worked at
numerous computer companies, including a stint for the XBox. I've
noticed that a (large?) fraction of game software has needs that are
similar to scientific computing. For instance, the physics simulation
is basically an ODE solver, and the 3d math usually involves very long
vectors of floating point numbers. So even though my good friend
Patrick (http://plam.csail.mit.edu/) assures me that my requirements
are niche at best, I am convinced that the game people (which some
argue drive performance pc computing) have similar requirements.

So I learnt ML and started writing some simple numerical algorithms
before writing the more sophisticated stuff I really wanted to do.
Unfortunately, the performance was disappointing (see for instance
http://groups-beta.google.com/group/comp.lang.functional/browse_frm/thread/=
4817bd07ade7a9f9/ebc182b6cd6005c8).
The ML code ran in more than twice the time the C++ code took, and
Haskell was about eleven times slower, and the code I had wasn't yet
at the level of abstraction I needed. Furthermore, it started looking
as if the more abstract code I intended to write might be harder to
write in ML and Haskell.

So it seemed to me that the Bagley benchmarks were very poor
representatives of what I would actually end up getting with these
languages. I looked at them more closely.

Most of the benchmarks do not test floating point performance, and
only use integers. Since many languages like ocaml unbox all integers
automatically but not necessarely all floating point values, this
makes all the functional languages look better ("functions and
integers are first class, the rest are second class.") I think that
tests ought to be written to challenge these languages a bit more. I
see that you have added an n-body benchmark, which is an excellent
step in the right direction, but more should be done.

1) Complex numbers are likely to be a weak point of some of these languages=
.
2) Vectors of not-doubles (say, vectors of complex numbers) would be
very interesting, as they are possibly not "special cased" in the
compilers.
3) Interval arithmetic and automatic differentiation would let us see
not only how well the compilers behave when faced with nontrivial
calculations, but also how concise they actually are when writing
nontrivial math things. Ocaml always has a tiny line count, but it may
suddenly have a larger line count when people have to write functorial
code and figure out that they can no longer use (+) and (+.) (Haskell
might be saved by type classes.)
4) Automatically differentiated complex variables over intervals of 80
bit floats are very likely to pose problems for all these special
casing compilers. The C++ signature for that type in my code might
look like autodiff<complex<interval<long double> > >.
5) If you do things like use several nontrivial algorithms (that call
each other) in the same benchmark, and if you also pass callbacks and
functions as parameters, you can defeat some of the "cheating"
optimization that simplifies all trivialities. For instance, I have an
adaptive integrator that stored intermediate results in a heap data
structure; this has a tendency to screw around with a lot of languages
(see below).
6) The benchmarks should not all be easily implemented on top of
LAPACK. Otherwise, numerical python will be faster than C.

I would like to contribute some tests to the suite in that general
direction. Because they have to be implemented in many languages, I
will try to keep them short (comparable to the existing tests) but I
would like to be less "micro" and more "medium sized."

I am attaching an example (I wrote these three programs.) This example
is drawn from the SIAM 100 digit challenge
http://web.comlab.ox.ac.uk/oucl/work/nick.trefethen/hundred.html
question 3. Our team had two solutions, this is the more
brute-forceish one. The problem is to compute ten significant digits
of the norm of a certain infinite matrix. Here are the timings I get
for this benchmark:

gcc version 0m3.640s
ocaml version 0m5.781s
python version 3m56.406s

This code requires vectors, so numerical python would help, however
the matrix is large and should not be instanciated. One would think
that the overhead of creating a temporary vector and initializing it n
times for each iteration in order to use a linear algebra package
would defeat its performance gain, so numerical python probably does
poorly here.

The algorithm is as follows:

1) Let A be the infinite matrix
1 1/2 1/4 ...
1/3 1/5 ...
1/6 ...
...
2) Let n be an integer and An be the nxn top-left submatrix of A.
3) Let u_0=3D1, the n-dimensional vector of ones.
4) for i=3D0 to 19, u_{k+1}=3DAn*An u_k, where An* is the matrix transpose =
of An.
5) The norm of A is sqrt(u_20 . u_19 / u_19 . u_19)

In this benchmark code, I took n=3D2000 but when I was solving the
original problem, for more safety I was using n=3D20000 or so. However,
I was afraid python would take years to complete. n=3D2000 gives ten
digits.

Please also consider the numerical integration code at
http://www.math.mcgill.ca/loisel/numerical-integration.html, which
shows Haskell and ocaml in an even worse light. The C++ and Haskell
code there is by me, but the ocaml code there is modified from my
Haskell by Xavier Leroy and Oleg Trott.

Would such benchmarks be interesting? How many/which languages should
I implement them in?

Cheers,

--=20
S=E9bastien Loisel -- http://www.math.mcgill.ca/loisel/

------8<----cut here------8<-----------8<---------------8<-------

#include <stdio.h>
#include <math.h>

double eval_A(int i, int j) { return 1.0/((i+j)*(i+j+1)/2+i+1); }

void eval_A_times_u(int N, const double u[], double Au[])
{
  int i,j;
  for(i=3D0;i<N;i++)
    {
      Au[i]=3D0;
      for(j=3D0;j<N;j++) Au[i]+=3Deval_A(i,j)*u[j];
    }
}

void eval_At_times_u(int N, const double u[], double Au[])
{
  int i,j;
  for(i=3D0;i<N;i++)
    {
      Au[i]=3D0;
      for(j=3D0;j<N;j++) Au[i]+=3Deval_A(j,i)*u[j];
    }
}

void eval_AtA_times_u(int N, const double u[], double AtAu[])
{ double v[N]; eval_A_times_u(N,u,v); eval_At_times_u(N,v,AtAu); }

int main()
{
  int N=3D2000,i;
  double u[N],v[N],vBv,vv;
  for(i=3D0;i<N;i++) u[i]=3D1;
  for(i=3D0;i<10;i++)
    {
      eval_AtA_times_u(N,u,v);
      eval_AtA_times_u(N,v,u);
    }
  vBv=3Dvv=3D0;
  for(i=3D0;i<N;i++) { vBv+=3Du[i]*v[i]; vv+=3Dv[i]*v[i]; }
  printf("%0.20f\n",sqrt(vBv/vv));
  return 0;
}

------8<----cut here------8<-----------8<---------------8<-------

let eval_A i j =3D 1.0 /. (float_of_int ((i+j)*(i+j+1)/2+i+1))
let eval_A_times_u u v =3D
  for i =3D 0 to (Array.length v) - 1 do
    v.(i) <- 0.0 ;
    for j =3D 0 to (Array.length u) - 1 do
      v.(i) <- v.(i) +. (eval_A i j) *. u.(j)
    done
  done
let eval_At_times_u u v =3D
  for i =3D 0 to (Array.length v)-1 do
    v.(i)<-0.0 ;
    for j =3D 0 to (Array.length u)-1 do
      v.(i)<-v.(i) +. (eval_A j i) *. u.(j)
    done
  done
let print_array u =3D
  for i =3D 0 to (Array.length u)-1 do
    Printf.printf " %f" u.(i)
  done;
  Printf.printf "\n"

let eval_AtA_times_u u v =3D
  let w =3D Array.create (Array.length u) 0.0 in
  eval_A_times_u u w; eval_At_times_u w v

let n=3D2000
let u =3D Array.create n 1.0
let v =3D Array.create n 0.0 ;;

for i =3D 0 to 9 do
  eval_AtA_times_u u v; eval_AtA_times_u v u
done;;

let vv =3D ref 0.0
let vBv =3D ref 0.0 ;;

for i=3D0 to n-1 do
  vv :=3D !vv +. v.(i) *. v.(i);
  vBv :=3D !vBv +. u.(i) *. v.(i)
done;;

Printf.printf "%.20f\n" (sqrt (!vBv /. !vv));;

------8<----cut here------8<-----------8<---------------8<-------

#!/usr/bin/python
#numarray version left as an exercise

from math import sqrt
def eval_A(i,j):
=09return 1.0/((i+j)*(i+j+1)/2+i+1)

def eval_A_times_u(u):
=09v=3D[0]*len(u)
=09for i in range(len(u)):
=09=09for j in range(len(u)):
=09=09=09v[i]=3Dv[i]+eval_A(i,j)*u[j]
=09return v

def eval_At_times_u(u):
=09v=3D[0]*len(u)
=09for i in range(len(u)):
=09=09for j in range(len(u)):
=09=09=09v[i]=3Dv[i]+eval_A(j,i)*u[j]
=09return v

def eval_AtA_times_u(u):
=09return eval_At_times_u(eval_A_times_u(u))

n=3D2000
u=3D[1]*n
for i in range(10):
=09v=3Deval_AtA_times_u(u)
=09u=3Deval_AtA_times_u(v)
vBv=3D0
vv=3D0
for i in range(n):
=09vBv=3DvBv+u[i]*v[i]
=09vv=3Dvv+v[i]*v[i]
print sqrt(vBv/vv)