[Shootout-list] Directions of various benchmarks

John Skaller skaller@users.sourceforge.net
Wed, 25 May 2005 22:28:44 +1000


On Wed, 2005-05-25 at 07:42 +0200, Bengt Kleberg wrote:
> On 2005-05-24 18:41, John Skaller wrote:

> moreover, i find it difficult to inform the tested program to back off 
> if it goes over the time out limit. backing off might not be that 
> important, of course.

Part of the problem is minibench -- I guess it is fairly old now
and probably needs a redesign. Not sure though, part of my problem
is I never really learned Perl -- couldn't see the point since I 
know Python, which I think scales to a higher level, although is
a bit clumbsier for small programs.

I have some basic code now that does some benchmarking,
mainly to check that my Felix optimisation work is going ahead:
the whole thing including benchmarking and web are fairly simple,
but I'm finding some extensions vital already: 

* I have to compare several Felix versions
* I must be able to handle multiple test machines
* I need to limit the testing time
* I need to know how accurate the results are

In my opinion, on the benchmarking side (as opposed
to the web site), it is quite easy to write any
code to do anything. The HARD part is design (as usual).

My basic idea for generalisation is a 'test record',
which reports the result of a single test, and contains
identifying information like:

* datetime
* test machine key
* test key
* test argument
* result

and to have a huge list of these to which results
from many sources can be aggregated over time. 

Auxilliary tables include a description of the
test machines by key: processor, memory, cache size,
speed, hostname, etc.

A test consists of a source key (which identifies 
the code to execute), and a translator key,
which idenfies the translator, a script
to build the test, and a test to run it (which 
accepts the argument).

With this 'database-like' kind of design, there are
three jobs -- running benchmarks, analysing the
results, and displaying them.

These three things could be quite separate,
connected only by the data design and location
information for the data tables.

One reason is that I could run tests at random,
and until I get an n around say 10 seconds,
and also re-run the test enough times that the
reliability of the result is established by some
statistical measure .. rather than guessing the
result is reliable.

I also need multiple architecture results. One immediate
reason is: ackermann's function with Felix is now:

* the same speed as Ocamlopt and gccopt on AMD64
* FASTER than gccopt on x86

Anyhow, I wonder if refactoring the Shootout into
3 separate processes as indicated above, with
documented data structures interconnecting them,
would make sense. In particular, to run the tests
and get a single 'set' of output, which is mergable
with other sets and can be fed into a statistical
analyser to generate data for the web site, or
other display mechanism.

-- 
John Skaller, skaller at users.sf.net
PO Box 401 Glebe, NSW 2037, Australia Ph:61-2-96600850 
Download Felix here: http://felix.sf.net