[Shootout-list] Coding For Speed

Jabari Zakiya jzakiya@mail.com
Thu, 24 Mar 2005 12:31:55 -0500


I'm bringing some issues raised in the
'Faster Ruby nbody Benchmark' thread here, and some
discussion from comp.lang.ruby, to expand the discussion=20
to other more general and philosophical issues.

=46rom the Shootout list Isaac Gouy said:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > No, to manually unrolling loops!
> > > (Doug Bagley explicitly stated that manually
> > unrolling loops wasn't
> > interesting, I guess we need to add the same
> > exclusion.)
>
> I've been adding these approaches as "interesting
> alternatives", which I guess flies in the face of
> Doug's statement ;-)

Ho ho :-)

The Ruby program showed up as an 'accepted' program - not as an
alternative.

> Manually unrolling loops =3D=3D more LOC, so perhaps
> we should allow them (or as 'interesting
> alternatives' and let the LOC metric start to be taken into account?

LOC isn't even a good indicator of concise syntax!

For Shootout to work as some-kind-of comparison between language
implementations (rather than programmer effort or skill), the programs
need to be plain vanilla comparable.

Yes many languages have optimising compilers which will unroll loops -
the language implementation is doing the work, that's the point!
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

But from comp.lang.ruby is was noted (edited):

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Newsgroups: comp.lang.ruby
From: Carlos <a...@quovadis.com.ar> - Find messages by this author
Subject: Re: n body problem

[Isaac Gouy <i...@yahoo.com>, 2005-03-23 17.34 CET]

> > Thus, I disagree that benchmarks are about applying some
> > arbitrary programming paradigm to a set of languages, just so
> > the "look" of the programs are similar.

> You're welcome to your opinion - and we're happy to include the program
> as an 'alternative'.

Don't you think this is a little unfair?

Maybe you didn't examine the program very well. The original author
(ab?)used the fact that a Struct can be used as an array. So, o.x can be
referred as o[0], o.y as o[1], and o.z as o[2]. But it is not necessary to
do so, and probably it is better to refer to its members by name.
....
....
Note that none of the other languages use arrays to represent the vector,
and none of them use loops to make these calculations.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

In fact, all the functioning nbody benchmarks on the Shootout list (Ada,=20
C, C#, Fortran, GForth, Java, Lua, Oberon, OCaml, Scheme, and SML) perform =
direct implementations of the physics math. Thus, there is no issue about u=
nrolling loops with those versions, and they all establish the legitimacy o=
f "nonloop use" style as a programming paradigm, and as accepted benchmarks=
 for the Shootout.

When engaging in the exercise of "benchmarking" inherent in that
exercise (to me) is to identify coding techniques and paradigms
within the idiom of the language being used to perform the
stated tasks as fast as possible.=20=20

Benchmarking is not about predetermining an arbitrary coding
structure paradigm and unilaterally applying that paradigm
to a every language, irrespective of the negative consequences
it has to producing the fastest program in that language.

This does not seem to be the priority of the Shootout, and
leads me to the philosophical question: What you really=20
trying to accomplish with these benchmarks?

To me, a language is a tool that the user decides how to (best)
apply to the task at hand. I know I would learn alot more
about a language to see how an "expert" would create the
fastest performing program for a benchmark, than to just see
copied boilerplate code, which primarily shows the existence
and capabilities of predefined use of structures and approach.

You learn alot more about the use of a language when you
see how a very different programming paradigm, idiomatic of
the given language, produces "best" results.

Take Forth for example.

The given (corrected) Fibonacci code, which mimics others, is:

: fib (n-m)
  dup 2 < if exit else 1- dup recurse swap 1- recurse + then
;

But below is a more idiomatic Forth version, because it uses
a simpler conditional structure, and only does the needed minimum.
And, its faster!

: fib (n-m)=20
  dup 1 > if 1- dup recurse swap 1- recurse + then
;

But if you really don't know Forth, you'll just mimic another
version you understand, and may (probably) never ever consider
a better version, because you just aren't sufficiently=20
conversant in the language to conceive of better code.

The same issue affects Ruby (et al).

Ruby is a native dynamic interpretive language. A user, to
acquire the highest performance possible from the language, has
to understand the consequences of this fact. One thing this means
is that a user has to be "the compiler" at times, in order to
squeeze the highest possible performance out of the language.
Ruby is not C/C++, with highly optimized compilers, which do
a lot of the thinking for the programmer to extract performance
from source code.

Thus, as in the original Ruby nbody code, you could produce the=20
magnitude of a 3-dimensional vector as follows:

  def magnitude
    d =3D self.inject(0) {|a,v| a + v*v}
    sqrt(d)
  end

but the original code took 41 minutes to run on my system
(AMD K-7, 600Mhz, 640MB, Mandrake 10.1 Ofl, Ruby 1.8.2)

However, the inject method is not "performance optimum".

Even using loops, all the following are faster:

  def magnitude
    d =3D 0.0: self.each {|v| d +=3D v*v}
    d =3D 0.0: (0..2).each {|i| d +=3D self[i]**2}
    d =3D 0.0; 3.times {|i| d +=3D self[i]**2}
    sqrt(d)
  end

At this point, by optimizing the most performance efficient
Ruby looping structure, I got the benchmark task down from
41 minutes to 27 minutes. Then the little lightbulb in my
head turned on.

Instead of looking at the task as an exercise in the use of
programming structures, I looked at what was the physics that
was trying to be done. After I realized what the physics
was, I just wrote code to "directly implement" the physics.

  def magnitude
    x=3Dself[0]; y=3Dself[1]; z=3Dself[2]
    sqrt(x*x + y*y + z*z)
  end

Not only is this code accurate, its simple, understandable, AND
creates a MUCH faster program, which is the main point for a=20
benchmark. Using direct implementation of just 4 function methods
reduced the revised benchmark (which I posted) time to 17 minutes,
from the 'original code' time of 41 minutes.

Further testing has reduced the time to 15 minutes, using the
following code for the magnitude, tested to be the fastest so far.

  def magnitude
    sqrt(self[0]**2 + self[1]**2 + self[2]**2)
  end

Besides executing the fastest in Ruby 1.8.2, you can't
produce code any shorter and simpler than this. And every other
benchmark does this calculation in the same manner.

So by what codified set of programmnig rules can anyone say, with
sustainable validity, that this coding paradigm is inappropriate?

In fact, in the "real world" where money, time, efficiency, and
people's lives matter, "direct implementation" of functions, in
my experience, is standard procedure, especially where speed
matters. This is especially true with languages such as Forth,
my native software language.

Similarly, Ruby is a great language to get real work done with.=20
It isn't natively the fastest, but it needn't be naively coded=20
to create unnecessarily slow performance.

Thus, I disagree that benchmarks are about applying some
arbitrary programming paradigm to a set of languages, just so
the "look" of the programs are similar. Benchmarks should
illustrate how a given language can be used to perform a
given task (the benchmark) in that language, particularly
using the best programming techniques and idioms unique to
that language.=20

Thus, to me, the primary essence of benchmarking is comparing
how different languages can be uniquely applied to optimally
perform the same tasks, and comparing those results.

If there are other characteristics you are really trying to
compare between languages, it would be best to explicitly state
them (such as shortest program, etc) and not confuse issues

I hope these comments explain my point-of-view, and reasoning,=20
on this issue. I await to see others views and reasoning.

Jabari Zakiya

--=20
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm