[Shootout-list] Directions of various benchmarks

Fri, 27 May 2005 23:24:44 +1000

On Fri, 2005-05-27 at 09:51 +0200, Bengt Kleberg wrote:
> On 2005-05-25 14:28, John Skaller wrote:
> ...deleted
> > I have some basic code now that does some benchmarking,
> > mainly to check that my Felix optimisation work is going ahead:
> > the whole thing including benchmarking and web are fairly simple,
> > but I'm finding some extensions vital already: 
> > 
> > * I have to compare several Felix versions
> > * I must be able to handle multiple test machines
> > * I need to limit the testing time
> > * I need to know how accurate the results are
> 
> the current shootout does the time limitiation bit on a per test basis. 
> do you mean the same thing here, or do you mean on a global scale (ie 
> run these tests on these languages in no more than x hours, adjusting 
> all ''n'' to fit the time).

Um .. I need to be able to run the tests and have them
finish in an 'estimable' and 'controllable' time, so my
computer is useful for things other than running tests.

Exactly how to control that -- for either my tests or the
Shootout -- is a difficult issue I think. 

Certainly to run the Shootout on my box, I can't 
leave them running for 24 hours. I'd go nuts -- TV here
isn't that good :)

> > My basic idea for generalisation is a 'test record',
> > which reports the result of a single test, and contains
> > identifying information like:
> > 
> > * datetime
> > * test machine key
> > * test key
> > * test argument
> > * result
> 
> do you mean result as in the current 3 metrics (time, memory-usage and 
> loc) or result as in the result of a test?

Good question.

> is this text or binary records?

For communication it would have to be text, and I'd
probably store it that way too, until the file(s)
got so large that something had to be done about it.

> > and to have a huge list of these to which results
> > from many sources can be aggregated over time. 
> > 
> > Auxilliary tables include a description of the
> > test machines by key: processor, memory, cache size,
> > speed, hostname, etc.
> > 
> > A test consists of a source key (which identifies 
> > the code to execute), and a translator key,
> > which idenfies the translator, a script
> > to build the test, and a test to run it (which 
> > accepts the argument).
> 
> ok with this, apart from the last item. what do you men with ''a test to 
> run it''?

I think that should have read 'a script to run it'.

> moreover, you might want to have the build depending upon the translator 
> key and the test key (since the shootout has different build flags (and 
> test flags) for different translators and tests.

The initial design wouldn't permit that, because it doesn't make
sense to compare 'gcc' on one test with 'gcc -O3' on another.

However, that's a very inflexible approach: if you had 20 options
with only two states that would be up to 2^20 translators .. clearly
absurd.

So .. yes, you could refine the notion of translator to be

(a) major tool
(b) options

for example. Please note that I'm only trying to explore
an idea. In some ways this all follows from Isaac's work,
decoupling and factoring.

> > Anyhow, I wonder if refactoring the Shootout into
> > 3 separate processes as indicated above, with
> > documented data structures interconnecting them

> sounds ok to me. i think it ought to be very simple to get the latest 
> results of all languages and all tests from the system since i suppose 
> that is the number one usage of this data.

Good point: you're saying the existing minibench already produces
enough data, it just needs to be collated.

> 
> one thing i consider important is to have a per language based 
> configuration. i would appreciate if i could add flags for a new 
> language without editing a file with lots of other language flags in it.

Do you mean 'translator' rather than 'language'?

-- 
John Skaller, skaller at users.sf.net
PO Box 401 Glebe, NSW 2037, Australia Ph:61-2-96600850 
Download Felix here: http://felix.sf.net