[Shootout-list] Directions of various benchmarks

José Bollo jose.bollo@tele2.fr
Wed, 18 May 2005 15:24:54 +0200


Le mercredi 18 Mai 2005 13:51, skaller a écrit :
> On Wed, 2005-05-18 at 17:19, José Bollo wrote:
> > > Bottom line of argument: "same way" tests make no sense
> > > because by their very nature they defeat the whole purpose
> > > of comparisons, to compare how similar problems are
> > > solved in DIFFERENT ways .. and what the consequences
> > > for performance, code clarity, ease of use, etc, are.
> >
> > that would be a long debate, here is my humble opinion: relaxing the
> > 'same way' will improve the competition and then every implementer will
> > copy the algorithm of the fastest realisation. So the 'no same way' will
> > tend to 'same way'.
>
> In that case, the test should be thrown out as a bad test.
>
> I agree this will happen to some extent and it should.
> However, consider just one example: a regexp search.
>
> Perl uses PCRE. Ocaml uses Str. C hand codes the solution
> using char *. Felix uses the builtin regular expression
> facility.
>
> What happens: Perl solution is terse, performs moderately,
> but is hard to read. The Ocaml solution is similar but a little
> faster because Leroy is smarter than Wall, but a little
> harder to read because it needs extra \ because there
> is no 'regexp' mode for strings. The C code is extremely
> fast but huge and unmaintainable. The Felix solution is almost as
> fast as the C solution, uses the clearest notation, but is
> a bit longer than the Perl.
>
> SO here we have 4 distinct solutions to the same problem,
> with 3 radically different approaches: on the Perl and Ocaml
> solution use a builtin NFA, and they're distinct implementations.
>
> Now if you go and insist on 'do it the same way' and someone disallows
> the Felix solution because it uses a regular definition system
> without captures -- and the C solution is 'obviously out' because
> it implements the state automaton by hand.
>
> This is absurd. All 4 solutions should be allowed, C should
> be penalised for being verbose and unmaintainable, Perl and Ocaml
> should be given credit for terseness but penalised for being
> somewhat unmaintainable (Ocaml being hurt more than Perl because
> Perl has special lexical support for regexps), Felix is penalised
> for being verbose but gets points for being clear and maintainable,
> and C and Felix get points for hands down clobbering both Ocaml
> and Perl because they both generate extremely fast low level
> linear machine code.
>
> For a problem without captures but a complex regexp,
> if Felix doesn't win this competition the test is flawed.
> For a simple regexp, Perl should win. For a heavy search,
> C and Felix should win.
>
> The point is that the languages are NOT all copying each
> other's algorithms, and they're going to give different
> results entirely on speed, depending on both the regexp
> being processed and on the data being processed.
>
> Although Ocaml can generate a DFA dynamically,
> so is too many lines to include -- I know because
> that code is precisely what the Felix compiler uses
> to generate the transition matrix. In practice
> the Ocaml user wanting high performance would
> use ocamllex to generate the automaton,
> because there is C code to process it faster
> than native Ocaml could.
>

That is a pretty good example. 
So the RegExp bench has to stay. It is good for a language to offer it.
But only a few subset of regexp is used in the bench.
I like the use of (?<...), (?!...), (?:...), \2, ...

>
> So I accept completely your argument, but not as an argument
> in favour of 'same way' requirements, but as an argument
> in favour of throwing out most tests than lead to similar results
> for all languages, given 'same thing' arguments (that is,
> tests that simply require a correct result no matter
> how it is obtained).

yes agreed
but it seems that it cost more in term of effort for each of the contributors
any way it is open so ... many contributors can play here