[Shootout-list] X per second scoring system

Bengt Kleberg bengt.kleberg@ericsson.com
Thu, 30 Sep 2004 11:10:06 +0200


Brandon J. Van Every wrote:
> Bengt Kleberg wrote:
...deleted
> I can only translate this as, "It is work (real problem).  Also I have a
> perceived issue (silly non-problem)."

ok, you may prefer 2 valued logic. personally i prefer shades-of-gray.


...deleted
> 
> I've looked at several C FFIs in various languages.  They aren't rocket
> science.  The worst of them just make you do some argument swizzling or
> some such.  A timer available in a language is just a function call.
> The cases of 'potential worrisome complication' are exhausted and you're
> just not seeing this.


my definition of ''exhausted'' would in this case mean that i would have 
had to look at all the languages that could be part of the shootout. 
your definition makes it a lot easier to exhaust.


>>supposed to mean that such a language is disqualified? ie,
>>all languges
>>that does not have sufficiently fine grained and exact timers, nor the
>>ability to call c functions, are not of interest?
> 
> 
> Well, before worrying about it, *are* there any?


>>imho the idea of n is to make the fastest language do the
>>test in x (1, 2, or somehting elkse) seconds.
> 
> 
> But that's not enough time to run anything from a
> startup/shutdown/slicing overhead standpoint.  You need 30..60 seconds
> for *any* language.  Otherwise you're measuring the OS as it burps, not
> the language.


what is it that stops us from using 30..60 as the value to replace 
''somehting elkse''?


> 
>>then the other languages takes longer
>>for the same n, thereby proving that they are weaker in performance.
> 
> 
> If you used the minimum requisite amount of time, 30..60 seconds, for
> the fastest language, then the slowest language is going to be a real
> pig if you don't change N.  Time really shouldn't be changing.  We know
> how much time we need for OS accuracy.  N should be changing for each
> test.

you are correct. i too belive that n should change for each test, and 
each language. it makes it easier to create the graphs. one n makes for 
too short lines.


>>>>(http://cm.bell-labs.com/cm/cs/who/bwk/interps/pap.html) as to why.
>>>
>>>Are you saying you do N calibration to detect loop
>>>optimizers?  Is this
>>>automated in the Shootout, or at least readily graphed and
>>>displayed?
>>>Otherwise the advantage is theoretical.  Also, if you're doing N
>>>calibrations to find linear scaling sequences, you're talking about
>>>doing an awful lot more benchmark runs than if you just had
>>>timers in
>>>the tests and didn't have to guess N.
>>>
>>
>>one quoute (out of many) is:
>>
>>Varying the problem size helped us to detect unusual runtime effects,
>>while a graphical presentation highlights patterns and trends
>>in runtime instead of individual performance scores.
>>
>>i find your question strange. you have read the paper, yes?
> 
> 
> Are you doing this kind of thing IN THE SHOOTOUT?  If you're not, then
> this is theoretical nicety, not something you actually take advantage
> of.  You're not analyzing like Knuth analyzed, so his advantages aren't
> your advantages.

i am sure that knuth analyzed in a very good way. not having read his 
works i am not in a position to discuss them. however, we are (well, i 
am) talking about ''timing trials'' from where the quoute came. and yes, 
if you look in the shootout you will find just such a graph.


>>moreover, i also find it hard to understand the question
>>about ''graphed
>>and displayed''. there is a graph for every test.
> 
> 
> You do not have graphs like Knuth's graphs.
>

again, not having read his works i am not in a position to discuss them. 
however, we are (well, i am) talking about ''timing trials'' where there 
is a graph for every test. you have read the paper, yes?


> Another way to put the point: do you have the foggiest idea about
> whether any test for any language in the Shootout is acutally using a
> loop optimizer to remove work?  Do you have any basis for either
> manually or automatically checking this sort of thing?  If so, when did
> you last perform such checks?

no, i do not have any idea if any language in the Shootout is acutally 
using a loop optimizer to remove work. for all that i know they may just 
be doing a sleep() until a reasonable amount of time has passed, and 
then print the expected answer.

unless you are talking aboput the tests i have implemented myself. i 
would never do such a thing.

given the nice little paper on ''trusting trust'', where it turns out 
that the compiler is meta-cheating, i fail to see how this kind of 
problem could be solved by any reasonable amount of work.



bengt