[Shootout-list] bench mark test run time, max bad score

Tue, 28 Sep 2004 11:27:22 -0700

Bengt Kleberg wrote:
>
> i suggested that the run time for the fastest implementation on a
> particular test should always be (about) 1 second.

That will play havoc with historical archives.  Computers are going to
get faster.  In 2008 some machine is going to be blowing the doors off
of some test.  Are you just going to renormalize everything and make it
difficult to compare with the results from 2004?  This is a subtle
reason why additive systems are better.  In an additive system, you're
just dealing with bigger and bigger numbers for 'work per second' as
computers get faster.

> And a hard limit
> should be in place at 300 (or 512, or something else)
> seconds. stop all
> entries that run for longer. give them max-bad-score.

In an additive scoring system, that seems quite generous and excessive.
Why not just a 60 second run for any given test, and whatever is
measured is measured?  It's all about the amount of work you can get
done in 60 seconds.  One might also consider the issue of 'warm up
periods' for certain kinds of tests, but over 60 seconds, perhaps this
can be blown off.

> if we adopt lowest-score-is-best then we could have seconds
> as our score units.

In an additive scoring system, the test itself would be the definition
of a work unit.  This is, factually speaking, always how it is anyways.
Probably one would call it 'test iterations per second'.  Higher
iteration scores are better.

> penalise the failures with N multiplied with the hard
> limit, ie this is max-bad-score.

Actually, a nice consequence of a positive scoring system is that tests
volunteer to award themselves a zero, or whatever miniscule amount of
work they can actually achieve.  No need to figure out a max-bad-score.

However, there's still a difference between a test that's available and
a test that isn't.  I still think we need to rank / penalize languages
on the number of *available* tests they have.  Otherwise some
one-hit-wonder language could come by, skip a few tests, blow the doors
off of those, and actually place decently in the rankings.  I'm saying
we should penalize incompleteness of testing suite as a matter of
policy.  Even if 'the math' says such languages should place well.

Cheers,                     www.indiegamedesign.com
Brandon Van Every           Seattle, WA

"The pioneer is the one with the arrows in his back."
                          - anonymous entrepreneur