[Shootout-list] X per second scoring system

Wed, 29 Sep 2004 02:44:00 -0700

Bengt Kleberg wrote:
>
> ''x per second'' is much more difficult to measure than
> ''time to do x''

C'mon, we're all computer guys here.  Surely you jest.

> it would also take a long time to run all tests on all
> languages since
> we are using trial-and-error to find the right n for each
> language.

Which is why you shouldn't 'find the right n'.  You run any given test
for 60 seconds.  However many 'n' times it runs, that's your score!
Works fine for Viewperf.  Everything is scored in terms of 'frames per
second'.  You don't sit around trying to guess n's.

> the
> nice thing is that we would not have to rerun any test unless the
> language/operating system/hardware changed.

You've lost me as to what you're trying to avoid.  In a positive scoring
system, the results are the results.  If you want to massage and
normalize the data afterwards, that's a postprocess on the data.

> the alternative would be to run te test with a very high n, and
> interrupt the test after a certain time. this seems error
> prone to me,

You run the test for 30..60 seconds so that performance is averaged over
a long, reasonable time.  This is standard drill in professional
benchmarking.  What are you trying to achieve, 3 second benchmarking or
something?  Then you're going to get a lot of error due to the startup,
slicing, and paging of a multitasking OS.

> as the language would have to be interrupted in a polite manner.

Why?  Stop your scoring timer first, then kill the thing.

> otherwise we would loose the value we want from the test (the
> x times it managed to do the test).

You are saying you don't have a counting loop in the source code of all
the tests?  Are you measuring the time to start up and shut down all the
program processes?  I don't see why you should be.  It's not
interesting, unless quick startup and shutdown is exactly what you want
to test.

> moreover, some test are not possible to do this
> way (reverse)

Just loop it again.  I think the problem is you're thinking that "loop
timers" around the source are a bad idea or something.  I will need to
look at the Shootout source code to understand exactly what your
objection is here.  What you're saying is making no sense compared to
other benchmarking suites I'm familiar with.  I would simply have start,
stop, loop, and counter functions implemented in a trivial C library.  A
language test would either have to call the C library or implement
equivalent timers.

Cheers,                     www.indiegamedesign.com
Brandon Van Every           Seattle, WA

"The pioneer is the one with the arrows in his back."
                          - anonymous entrepreneur