[Shootout-list] X per second scoring system, resume

Thu, 30 Sep 2004 14:26:07 +0200

Brandon J. Van Every wrote:
> Bengt Kleberg wrote:
> 
>>the shootout have the possiblity to remove the startup/stop
>>times from
>>the timings in the scores. great thing, imho. if the interested party
>>wants startup included (or not) they can choose.
>>
>>for the simple bench mark i suggest removing the startup time
>>by default.
> 
> 
> Startup time varies by language.  Within a language, startup time varies
> by data structures and housekeeping initialized before the language's
> equivalent of main() begins.

this is correct. we would need the startup time of that particular 
language when we remove the startup time from the results of that 
particular language in a test.
this was my intention all the time. if it came out as if i wanted to use 
one and the same startup time for all languages it was a mistake.

> Shutdown time is highly dependent on the data structures and control
> flow of the test.

this is correct. and, hopefully a good thing (tm). ie, we have a fixed 
shutdown time that is measured for a as-short-as-possible program. 
hopefully there is a correlation between small, fast program without any 
data and smallest possible stop time. it does not have to be, but we 
will not be totally off the mark.
then, when we subtract the smallest possible startup and stop time, we 
know that it is the test specific times that is left. i want the time 
spend in setting up things used in the test to be counted as part of the 
time it takes to do the test. this helps to keep programs that wants to 
speed things up by allocating things in advance, honest.
imho.

> Slicing cannot be eliminated.  You can only run a test long enough that
> it's averaged out.  Too short a test, and you get an OS process switch
> spike.
> 
> Let us also not forget OS startup/shutdown timing hiccups.

true.
if we want a perfect timing system we would need to run single user, 
without paging, and preferable no scheduling. real life data, just as 
the user will experience it when running programs :-)

> So, given these realities, how does the Shootout allow for startup/stop
> times to be removed from the scores?  I don't see anything in the tests
> themselves to indicate per-test control.  I'm supposing you mean you can
> subtract a number for a completely different test, presumed to represent
> the startup/stop times for a given language.  I say this is a false
> assumption.

it is not totally correct. but i would say that it is much more 
truth-like that having the results of many years ago archived and used 
for comparing with todays results :-)
and that is ok, is it not :-)

> But if empirical evidence says it's a good assumption, I'll stand
> corrected.  I'd ask how the empirical evidence is derived and verified.

what if we run the smallest program several times and see how much 
variation there is in the results?
how much variation would you think is allowable to remove your fears of 
Slicing, process switch spike and startup/shutdown timing hiccups?
i ask since it is not a test if we do not know what we are expecting 
before doing it.

bengt