[Shootout-list] fannkuch (timer resolution; HZ=1000?)

Fri, 20 May 2005 21:41:05 +0100

On Friday 20 May 2005 20:52, Brent Fulgham wrote:
> --- Jon Harrop <jon@ffconsultancy.com> wrote:
> > Two bigger problems are that most modern language
> > will add much bigger random noise and a bias. OCaml
> > running times often vary by 0.1s and I've heard
> > (from Microsoft) that C# does poorly in the shootout
> > because it has a big startup time.
>
> C# doesn't do so poorly.  It beats the pants off of
> Java (meaning Sun's HotSpot version) in almost all
> tests.

Well, we can quantify this by looking at how long languages take to run their 
fastest benchmarks. Look at gcc first:

{0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 
0.01, 0.01, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.03, 
0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.04, 0.04, 0.04, 0.04, 0.04, 
0.04, 0.04, 0.04, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.06, 0.06, 0.06, 0.06, 
0.06, 0.06, 0.07, 0.07, 0.07, 0.07, 0.07, 0.08, 0.08, 0.08, 0.08, 0.09, 0.09, 
0.1, 0.1, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 
0.13, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.2, 
0.22, 0.22, 0.25, 0.25, 0.26, 0.26, 0.26, 0.26, 0.28, 0.29, 0.29, 0.29, 0.29, 
0.3, 0.3, 0.3, 0.33, 0.33, 0.36, 0.36, 0.36, 0.38, 0.38, 0.39, 0.39, 0.5, 
0.53, 0.57, 0.61, 0.79, 0.84, 0.85, 0.87, 0.88, 0.9, 0.92, 1.06, 1.11, 1.17, 
1.3, 1.49, 1.58, 1.84, 1.95, 2.24, 2.26, 2.42, 2.56, 2.87, 2.87, 2.98, 3.35, 
4.66, 10.18}

Now look at C# and, in particular, at the distribution of the fastest run 
times (all are >0.1s):

{0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 
0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 
0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.19, 
0.19, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.24, 0.24, 0.25, 0.26, 
0.26, 0.27, 0.27, 0.29, 0.29, 0.31, 0.31, 0.32, 0.32, 0.32, 0.33, 0.33, 0.34, 
0.34, 0.34, 0.35, 0.36, 0.37, 0.4, 0.4, 0.41, 0.42, 0.42, 0.42, 0.43, 0.46, 
0.46, 0.47, 0.47, 0.49, 0.49, 0.5, 0.54, 0.55, 0.56, 0.58, 0.62, 0.64, 0.67, 
0.68, 0.71, 0.73, 0.74, 0.74, 0.74, 0.8, 0.83, 0.85, 0.86, 0.87, 0.87, 0.94, 
0.97, 0.98, 1., 1.03, 1.05, 1.06, 1.07, 1.07, 1.07, 1.19, 1.19, 1.2, 1.21, 
1.23, 1.28, 1.29, 1.3, 1.32, 1.43, 1.48, 1.55, 1.56, 1.61, 1.66, 1.7, 1.7, 
1.85, 1.9, 2.02, 2.06, 2.06, 2.11, 2.15, 2.27, 2.86, 3.6, 4.07, 4.61, 5.19, 
5.42, 5.49, 6.95, 7.18, 7.96, 8.71, 10.31, 10.75, 12.6, 13.71, 14.35, 15.08, 
19.01, 20.45, 27.43}

As you say, the situation is literally twice as bad for Java:

{0.18, 0.2, 0.2, 0.22, 0.23, 0.24, 0.24, 0.24, 0.25, 0.25, 0.25, 0.26, 0.26, 
0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.27, 0.27, 0.28, 0.28, 0.29, 0.29, 0.29, 
0.3, 0.3, 0.31, 0.31, 0.32, 0.33, 0.34, 0.34, 0.35, 0.35, 0.36, 0.36, 0.36, 
0.37, 0.39, 0.39, 0.39, 0.4, 0.41, 0.41, 0.41, 0.41, 0.41, 0.42, 0.43, 0.45, 
0.45, 0.45, 0.46, 0.46, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.53, 0.54, 0.54, 
0.55, 0.55, 0.56, 0.58, 0.67, 0.68, 0.74, 0.76, 0.77, 0.78, 0.79, 0.79, 0.83, 
0.93, 0.96, 0.99, 1.03, 1.06, 1.09, 1.09, 1.14, 1.15, 1.17, 1.2, 1.24, 1.29, 
1.33, 1.39, 1.43, 1.47, 1.47, 1.49, 1.6, 1.62, 1.62, 1.64, 1.68, 1.74, 1.78, 
1.8, 1.83, 1.84, 1.9, 1.94, 1.97, 1.98, 2.01, 2.05, 2.11, 2.18, 2.26, 2.29, 
2.3, 2.31, 2.47, 2.55, 2.61, 2.7, 2.73, 2.76, 2.94, 2.97, 3.17, 3.23, 3.41, 
3.76, 4.36, 4.68, 5.72, 6.69, 6.78, 8.01, 8.37, 11.19, 15.57, 16.75, 17.01, 
25.1, 33.53, 159.62}

So there is a systematic error in the results of 0.2s.

> > I thought it would be nice to do those Skaller plots
> > using only tests which ran for >0.1s or even >1s but
> > there is virtually no such data on the shootout
> > (i.e. the plots are all blank).
>
> So, we should probably up the 'N' on most tests until
> we can get about 0.5 seconds or so on the fastest
> performer.

Yes, even at 0.5s, almost half of the run-time is spent starting Java.

> Obviously, this causes problems for the slow
> languages (since they will most likely time out now).

Yes, as long as data is kept for smaller "n" then I think the consequences are 
worth suffering in order to better the benchmark for the better languages.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists