[Shootout-list] Directions of various benchmarks

Tue, 17 May 2005 05:02:28 +0100

On Tuesday 17 May 2005 04:35, skaller wrote:
> On Tue, 2005-05-17 at 10:53, Jon Harrop wrote:
> > I disagree. Anything which fits well into an interpreted language (e.g.
> > regexps in Perl) will be both fast and succinct.
>
> How interesting is it to compare Perl/PCRE with C/PCRE
> with Python/PCRE with Ocaml/PCRE .. is it interesting to
> compare PCRE with PCRE??

Why does the OCaml implementation of regex current use Markus' PCRE rather 
than OCaml's built-in Str module?

> IMHO it is more interesting to compare Felix regexps with PCRE:
> Felix has *real* builtin regexps which Perl does NOT, and they
> have 'in principle' O(1) execution speed compared to Perl's
> exponential worst case, at the cost of exponential memory..
> and I have no idea if they're actually faster on significant
> or nasty problems.

Good idea.

> > General code will, of
> > course, be much slower. But the challenge for the interpreted-language
> > authors is then to shoe-horn all problems into the limited capabilities
> > of the language. This is an accurate reflection of (some) real-world
> > programming.
>
> Isn't it the same for compiled languages:
> trying to shoehorn the problem into a way the compiler handles
> efficiently: eg tail recursive vs. nontail recursive list
> handling in an FPL, writing C code in a way that
> C compilers will optimise, etc.

Yes, I suppose it is the same problem, although there is a huge performance 
hit for not using built-ins from an interpreted language but a comparatively 
small hit for using non-tail recursion in a compiled language.

> > > For example, Python and Perl have hash tables built in. There is a
> > > big difference between implementing an algorithm using a builtin
> > > hash table, and implementing one that needs a binary tree.
> >
> > Yes, and this is something the shootout should illustrate, not hide.
>
> Yes. However it won't do that by emphasising performance over
> everything else. Somehow I would like to see much more emphasis
> on clarity of code and developer time. perhaps by CRAPS weighting
> LOC equal to speed by default (last time I tried that GHC
> won the Shootout hands down).

I expect everyone would agree. The problem is that clarity of code and 
developer effort are inherently qualitative. LOC is much more objective and 
quantitative but still suffers from "do we or don't we count lines with a 
single brace on in C/C++/Java/C#" etc.?

> > For different implementations of arithmetic-intensive code, no. The
> > algorithms used will have a bigger effect than compiler switches. This
> > comes into play when verbose languages start to exceed 100 LOC and have
> > to be cut down, rather than optimised.
>
> Yes, but you're agreeing with me, only you're relying on a
> 100 LOC limit, whereas I'd prefer to see the LOC reflected
> directly in the defaut craps weighting.
>
> I wouldn't exclude a 200 line C solution that outperformed
> Ocaml's 50 line solution -- but I'd make sure that the overall
> rankings reflected the tradeoff.

I think there is a place for both ~100- and variable-LOC benchmarks. The 
former can illustrate the performance benefits of a succinct language whereas 
the latter can illustrate the brevity benefits of a succinct language.

I like the idea of asking authors to emphasise performance when <100 LOC but 
emphasise brevity when >100 LOC. Then it is a case of having some tests (like 
pidigits) for which >20 lines of Haskell is of no benefit and other tests 
(like the ray tracer) for which 100 lines of OCaml can go a lot further than 
100 lines of C.

> Hmm .. I wonder about a performance vs. loc picture?
> This would be a two dimensional grid with an X for each
> language..

That's an absolutely excellent idea! It sounds pretty easy to feed in timing 
vs LOC data for each benchmark for each language and plot the resulting graph 
in color. That would make a great summary for the front page. :-)

Incidentally, if all of the current benchmark results could be made 
downloadable in some simple format (not XML!) then I wouldn't mind writing 
some 3D visualisation programs to study the results. It would be interesting 
to dissect the time vs memory vs LOC in 3D. :-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists