[Shootout-list] Directions of various benchmarks

Mon, 23 May 2005 09:53:25 +0200

On 2005-05-20 19:07, Brent Fulgham wrote:
> --- Bengt Kleberg <bengt.kleberg@ericsson.com> wrote:
> 
>>i can help write the benchmark specifications. i
>>only need to finish working on the most important 
>>project for the shootout first. it is proving to be
>>a lot more complicated than i thought to get more 
>>''n'' for each test.
> 
> 
> How many different values of N would you like to
> see for the tests?

i would like to see a sufficent amount, and spread, of ''n'' to make all 
languages exhaust the available resources. iff (that used to mean ''if, 
and only if'') the test is designed to allow resources to be exhausted.
otherwise ''n'' only needs to make all implementations reach a 
sufficently long run time. i think that sufficiently long is when the 
run time variation is below some percentage. i am happy with 10%, but 
others might have a better value.

> Would some tests need more N than others (I would
> imagine some tests need a higher resolution to show
> variations in performance?

this is a good question. i do not know. if this is a resource conserving 
test it would need the ''n'' to reach a sufficiently long run time. this 
is might be different from finding resource exhaustion.

start with n=1 (or 10, or 100, or 1000 depending upon the test). 
increase with either fixed addition (eg ackerman), or with fixed 
multiplication. for the muliplication, once we get over the time out 
limit, we go back by halving ''n'' until we have arrived below the time out.

this looks like this for some assorted schemes doing word count with 
time out at 2 seconds and minimum at 1 second (10% variation allowed). 
the numbers on the line after ''run'' is the verbose output. the numbers 
on the lines below ''run'' are:
n	milliseconds	erronous-memory-usage

bigloo wc compile
0 400 0
bigloo wc run  0-0-1- 0-2-10- 1-21-100- 14-131-1000- 147-1206-10000- 
1470-11964-100000- 14692-121522-1000000- 30361-251430-10000000- 
30361-251430-12500000- 30361-251430-6250000-
1 0 0
10 0 0
100 10 0
1000 10 0
10000 10 0
100000 20 0
1000000 200 0
10000000 1940 0

gauche wc run  0-0-1- 0-2-10- 1-21-100- 14-131-1000- 147-1206-10000- 
184-1505-12500-
1 30 0
10 20 0
100 30 0
1000 40 0
10000 650 0
12500 950 0

guile wc run  0-0-1- 0-2-10- 1-21-100- 14-131-1000- 147-1206-10000- 
1470-11964-100000- 7346-60751-500000- 3674-30331-250000-
1 80 0
10 100 0
100 90 0
1000 120 0
10000 180 0
100000 690 0
250000 1510 0

mzscheme wc run  0-0-1- 0-2-10- 1-21-100- 14-131-1000- 147-1206-10000- 
1470-11964-100000- 14692-121522-1000000- 18363-152066-1250000-
1 40 0
10 40 0
100 40 0
1000 40 0
10000 70 0
100000 190 0
1000000 1500 0
1250000 1860 0

scsh wc run  0-0-1- 0-2-10- 1-21-100- 14-131-1000- 147-1206-10000- 
1470-11964-100000- 3674-30331-250000- 1838-15069-125000-
1 80 0
10 80 0
100 80 0
1000 90 0
10000 180 0
100000 1080 0
125000 1320 0

> Can you (or perhaps everyone) give me a feel for how
> much we are talking about.  It's no big deal to add
> a few more iterations to the run (it will just mean
> the shootout will take longer to run).
> 
> But there are practical limitations.  If we want
> to have 1,000 different values of N for each test,
> then we move from a shootout run that takes roughly
> 24 hours to something that could take weeks to run
> (since run times increase as N increases.

if we need a total run time of 24 hours i think we could calcualte 
backwards to get a value for how many ''n''.

24 hours is 24 * 3600 seconds.
we have about 60 languages => 24 * 60 seconds
we have about 30 tests => 24 * 20 seconds

so 480 seconds have to be enough for all ''n''.

relax the all langugages or all tests rule and we get more seconds.

it seems as if the small values for ''n'' do not take much time, atleast 
not in this test. perhaps only the runs that reach time out time are 
worth counting. they are at a minimum 2.

bengt