[Shootout-list] X per second scoring system

Thu, 30 Sep 2004 01:11:29 -0700

Bengt Kleberg wrote:
> theBrandon J. Van Every wrote:
> >
> > I can respect the argument of 'it being work'.  But 'it's
> > tough to look at' is just silly.
>
> how about
> not only is it work (major problem), but it also tough to
> look at (minor problem).

I can only translate this as, "It is work (real problem).  Also I have a
perceived issue (silly non-problem)."

> if it would give us a major benefit i can take the work

That is the only criterion I'm personally interested in.  I don't see a
point in changing lots of stuff for minor gains.  I haven't quite
reasoned through the 'gains', though.

> > I can't think of any reason why that would be true.
> > Languages either
> > have a way to call timers or C functions, or they don't.  There's
> > nothing complicated about it.
>
> neither can i think of any reason why it would be
> complicated. i do not
> assume that my inablity to come up with such a reason makes it
> impossible.

I've looked at several C FFIs in various languages.  They aren't rocket
science.  The worst of them just make you do some argument swizzling or
some such.  A timer available in a language is just a function call.
The cases of 'potential worrisome complication' are exhausted and you're
just not seeing this.

> supposed to mean that such a language is disqualified? ie,
> all languges
> that does not have sufficiently fine grained and exact timers, nor the
> ability to call c functions, are not of interest?

Well, before worrying about it, *are* there any?

> imho the idea of n is to make the fastest language do the
> test in x (1, 2, or somehting elkse) seconds.

But that's not enough time to run anything from a
startup/shutdown/slicing overhead standpoint.  You need 30..60 seconds
for *any* language.  Otherwise you're measuring the OS as it burps, not
the language.

> then the other languages takes longer
> for the same n, thereby proving that they are weaker in performance.

If you used the minimum requisite amount of time, 30..60 seconds, for
the fastest language, then the slowest language is going to be a real
pig if you don't change N.  Time really shouldn't be changing.  We know
how much time we need for OS accuracy.  N should be changing for each
test.

> >>(http://cm.bell-labs.com/cm/cs/who/bwk/interps/pap.html) as to why.
> >
> > Are you saying you do N calibration to detect loop
> > optimizers?  Is this
> > automated in the Shootout, or at least readily graphed and
> > displayed?
> > Otherwise the advantage is theoretical.  Also, if you're doing N
> > calibrations to find linear scaling sequences, you're talking about
> > doing an awful lot more benchmark runs than if you just had
> > timers in
> > the tests and didn't have to guess N.
> >
>
> one quoute (out of many) is:
>
> Varying the problem size helped us to detect unusual runtime effects,
> while a graphical presentation highlights patterns and trends
> in runtime instead of individual performance scores.
>
> i find your question strange. you have read the paper, yes?

Are you doing this kind of thing IN THE SHOOTOUT?  If you're not, then
this is theoretical nicety, not something you actually take advantage
of.  You're not analyzing like Knuth analyzed, so his advantages aren't
your advantages.

> moreover, i also find it hard to understand the question
> about ''graphed
> and displayed''. there is a graph for every test.

You do not have graphs like Knuth's graphs.

Another way to put the point: do you have the foggiest idea about
whether any test for any language in the Shootout is acutally using a
loop optimizer to remove work?  Do you have any basis for either
manually or automatically checking this sort of thing?  If so, when did
you last perform such checks?

Cheers,                     www.indiegamedesign.com
Brandon Van Every           Seattle, WA

"The pioneer is the one with the arrows in his back."
                          - anonymous entrepreneur