[Shootout-list] main benchmark

Mon, 27 Sep 2004 10:52:44 -0700 (PDT)

--- "Brandon J. Van Every" wrote:
> Within reason, I can't abide laziness as a design
> criterion.  I mean really, you either want to 
> accomplish something with the Shootout or you
> don't.  I'm getting the feeling that you think the
> Shootout 'is pretty neat' and that's where you're 
> going to leave it.  It is somehow fulfilling your 
> goals as it stands.  It doesn't fulfil any of my
> commercial goals, other than being
> yet-another-data-point-somewhere-on-the-internet.

The original goal of the shootout was "To Have Fun".

My goals with the revived shootout is to understand
how the myriad available languages compare in features
and performance.  I think the Shootout is succeeding
in this goal, and I think (based on the e-mail I get)
that most people agree.

Insofar as the goal of understanding features and
performance match your so-far-unspecified commercial
goals, great.  But if your idea is that we discard
various data points because they don't fall into the
matrix of ideas you want to get across to some
audience I think we may have incompatible goals.

I guess I view the Shootout as an impartial experiment
that attempts to advance the state of the art, and to
help provide a framework for language comparison.  I
have every reason to believe that this kind of work
has value to the world at large, and have not seen any
convincing argument to the contrary.

> Pretty much the choice is between the Shootout as it
> is now, or a Shootout more in line with commercial 
> benchmarkslike Viewperf, http://www.spec.org, 

I took a quick look at the Viewperf stuff at this
site.  I'm not quite sure what you feel we are
missing.  I would categorize Viewperf as a very large
'task' based benchmark, similar to the 'spellcheck'
test (though of course, much larger).  Most of our
tests are microbenchmarks designed to test this or
that aspect of a language implementation.  Instead of
viewperf, our test would be something like "cost of
bump mapping on a convex hull" or similar.

What are they providing that we are not?  Is it just
that you want there to be a single score that can be
compared for all languages?

What if we provided a BARC (Brandon's Arbitrarily
Reduced Criteria) screen that showed each languages'
score would that be sufficient?

We could also include BITE (Brent's Inclusive Test of
Everything) as an alternative metric, and we could
dicide if Brandon's BARC is better or worse than
Brent's BITE.  :-P

-Brent