[Shootout-list] Many tests cheat

Einar Karttunen ekarttun@cs.helsinki.fi
Wed, 3 Nov 2004 11:13:44 +0200


On 03.11 19:34, skaller wrote:
> > But we don't want functional arrays in the tests and the languages
> > provide ways to use non-functional arrays, so what is the point. 
> 
> The point is that when you say 'measures array access'
> I am saying the specification is unclear what an array
> actually is. In general, that requires random O(1) read/write
> access which cannot be provided in general in an FPL.

I think that most people understand with an array something that provides:
* a fast access to a from a finite subset of N to some values.
* in addition arrays may be mutable which means their contents can be changed

O(1) read access is trivially possible in functional languages 
(think about N-tuples). What is hard is updates.

Now even read-only arrays *are* arrays.

> But in general, a suitable test must fail pure FPLs
> because they can't provide arrays. [I mean,
> there really *should* be one test which knocks
> FPLs hard]

Have you read the paper about ST arrays? 
http://research.microsoft.com/Users/simonpj/Papers/state-lasc.ps.gz

> The way to fail them isn't to argue about what an array
> looks like in the language. It has to be by bounding
> time and memory so that contigous storage is the only
> viable solution. That makes the test truly independent
> of the language, and independent of any argument about
> what an array is for that language, and whether
> it meets the constrains or not -- an operational constraint
> simply fails the test by exceeding bounds during the
> testing process.

Well terminology varies by language, but saying array rather 
than a more involved definition makes the most sense for 
most people. If someone has a datastructure that fares 
better than standard arrays then I would be very interested...

> Yes indeed, but there are circumstances where C mutable
> arrays will provide a solution that cannot be coded
> in a pure FPL (by which I mean suitably fast).

Monads are pure.

> Obviously, any *particular* C solution has a functional
> equivalent which can run at the same speed, but will not
> always be possible to mechanically find that with FPL
> optimiser -- if it were possible, FPLs would rule!
> But I suspect its the halting problem  .. :)

One can translate C to the equivalent with a suitable monad 
which is just slower (a constant factor most likely), so this
is not the problem. It will be more ugly than the C code 
though...

> Yes, and some problem will fail to have a suitable typing.

The classic dynamic/static typing thing... Of course if I 
want to be evil I can do unsafe typecasts with GHC. But the
same argument goes for imperative languages with strong 
typing.

> Basically this has to be the case, otherwise there
> would be no point trying to get fast functional arrays,
> you'd just use the functional ones that looked mutable
> all the time and performed like C arrays.

That is also about beauty..

 
> That's really the point -- I don't understand the
> intended constraint, any other way than it meaning
> O(1) performance for random read/write .. and so that's
> what really should be stated .. this has the advantage
> of annoying Brent by needing extra checking on the
> run time behaviour, but it eliminates any possible
> way of cheating, including non-deliberate cheating.

Actually requiring O(1) is not necessary. If some entry is worse
than O(1) it should not fare very well in the results. The 
array is there because it is intuitive for most people.

> I don't want to be extreme here -- I'm just suggesting
> a move to tests more based on observables where any
> program is ok if it gives the right answer within
> memory and time limits, rather than arguing if 
> a test is 'fair' or not. That's why the topic
> 'many tests cheat' .. it isn't so easy to say if 
> a test cheats of not sometimes.

This is mostly a problem with the do X n times and return the 
result of the last time. This is not generally considered good,
but I don't have the time/energy to change all the entries.

> The example of Felix Exception test is a good one
> (raised by Isaac Gouy).

I think the problem of that is that other languages could use 
the same approach as Felix which would improve their results.
e.g. that could be translated straight into C making it fare
better. 

> Here, the question is 'what do you mean by *native*
> technique'. The answer isn't so obvious for Felix:
> embedding C/C++ is definitely a native technique,
> there are more constructions for doing that than 
> anything else I think. Still, the requirement
> for method calls test isn't for using native embedding,
> and the native *felix* code is doing ordinary function
> calls. .. of course Python does that too, methods are
> also just functions bound to some object... no Isaac I
> don't want an argument .. I'm saying you could legitimately
> disagree and that is the point: it just isn't clear
> given a novel solution to a problem what the meaning 
> of a specification which uses terminology designed
> for more conventional systems actually means.

I use the following methology with submitting tests:
* If my implementation appears to be fair commit it
* If my implementation beats the crap out of other implementations 
  post it to the list asking "do you think this is fair?"

- Einar Karttunen