[Shootout-list] ackermann, values for ''n''

Wed, 08 Jun 2005 10:52:45 +0200

On 2005-06-08 09:30, John Skaller wrote:
...deleted
> The process seeks the largest 'n' value for each language
> independently. Also the smallest 'n' value. However data

very good.

> This means you could mechanically calculate the largest
> 'n' for which all language didn't timeout, or perhaps
> more sensible, for which 80% of languages didn't timeout.
> Actually, you could just say 30% of languages didn't timeout,
> and only include them in the tables/graphs -- excluding
> the slowest 70% of languages.

preferably the time out should be high enough to make sure that for an 
''n'' all languages can get an allowed run time. (allowed run time is 
not too short, not to long)

if that is not possible then my suggestion is to take the ''n'' value 
that has the largest amount of allowed run times. ie, we have ''n'' of 
10, 100, 1000 and 10000.
''n'' equal to 10 makes 90% of the languages run too fast, and 0% run 
too slow.
n equal to 100 makes 70% too fast and 10% too slow.
n equal to 1000 makes 20% to fast and 20% too slow.
n equal to 10000 make 0% too fast and 90% too slow.
take ''n'' equal to 1000. (for the sake of argument i here assume that 
500 or 2000 will not be better than 1000).

...deleted
> [You could actually RUN the code I published!!]

this is true. but i would then only find out what the code does, not 
what you intend it do do. there could be a difference. (this was meant 
to be both correct and slightly funny)

>>i am assuming that you mean that ''a test process'' is all the languages 
>>for a test, or all the tests for a language, etc. if you only mean a 
>>single language and a single test, then my statement is not applicable.
> 
> 
> I mean neither. The current code I have runs tests randomly:
> by 'test' I mean a single execution of a program with 
> a single value of n.

ok. i now understand what a test is. but i then fail to understand what 
you meant with ''if a test fails after 90%''.

...deleted
> I argue that random testing and measurement of real
> clock times on a machine which is randomly loaded
> with other processes .. typing emails, compiling programs,
> running cron jobs like updatedb .. is actually a realistic
> and useful measure, probably more useful than running
> the tests in a clean-room (a machine unloaded by any

i agree. it is realistic and usefull _for that one test_.

but for more than one test i do not agree. i think that if i want to 
compare two tests with the same task to perform (algorith and n), but 
different languages, then having different loads makes the tests less 
usefull.
imho.

> other work) -- the actual results in both cases
> are just as reliable, however the conditions for
> doing the clean-room measurements are much harder
> to obtain and very expensive and arguably not 
> applicable to the real world. However the results

if we have a spare computer not doing anyhting else (not uncommon, in my 
experience) then it would not be harder, nor expensive. imho.
the same goes if i have a computer that does not do anything for a 
longish period of time (like over the night).

> can be obtained faster this way, meaning one can
> tolerate much lower numbers of tests and expect
> most results to cluster tightly around the average,
> wheres the 'dirty' testing procedure will have a much
> larger spread of results and has to run much longer
> to get meaningful comparisons. However it can

i agree. with lots of random tests it sounds as if it would be possible 
to eliminate the different loads from the metrics. unfortunatly i do not 
know enough of statistics to argue this point.

bengt