[Shootout-list] ackermann, values for ''n''

Einar Karttunen ekarttun@cs.helsinki.fi
Wed, 08 Jun 2005 12:32:52 +0300


John Skaller <skaller@users.sourceforge.net> writes:
> I argue that random testing and measurement of real
> clock times on a machine which is randomly loaded
> with other processes .. typing emails, compiling programs,
> running cron jobs like updatedb .. is actually a realistic
> and useful measure, probably more useful than running
> the tests in a clean-room (a machine unloaded by any
> other work) -- the actual results in both cases
> are just as reliable, however the conditions for
> doing the clean-room measurements are much harder
> to obtain and very expensive and arguably not 
> applicable to the real world. However the results
> can be obtained faster this way, meaning one can
> tolerate much lower numbers of tests and expect
> most results to cluster tightly around the average,
> wheres the 'dirty' testing procedure will have a much
> larger spread of results and has to run much longer
> to get meaningful comparisons. However it can
> aggregate results sensibly, and eliminates one
> source of systematic error (cache preloading,
> cache times being counted, but arbitrarily excluding
> system overhead). Instead of excluding things like
> disk-to-memory load times, the dirty process
> averages them away.

I think clean-room testing is the best approach. There are less sources
of noice in a clean-room environment than in a loaded one. Also the
errors in loaded environments can be systematical and dependent on the
machine. Also in a loaded environment much depends how well the OS
virtual memory and scheduling policy happens to play along with the
VM. This depends also on the amount of memory and speed of the
processor... 

Adding more unknown variables makes results harder to analyze.

- Einar Karttunen