[Shootout-list] word-frequency

John Skaller skaller@users.sourceforge.net
Mon, 20 Jun 2005 07:58:12 +1000


On Sat, 2005-06-18 at 11:48 +0200, Nicolas Neuss wrote:
> Hello,
> 
> after Eric Lavigne and me wrote the CMUCL code for the word-frequency
> benchmark, I looked how some languages manage to be even shorter than our
> version.  As much as I see, the main reason is that the requirements of the
> word frequency shootout are artificial (words have to be ordered in
> decreasing frequency AND reverse alphabetical order).  This allows for the
> trick of sorting the strings obtained by concatenating word and frequency.
> IMO, this is a kludge because it only works for that unnatural alphabetical
> ordering.
> 
> Question: wouldn't it be more telling about the power of a language, if the
> natural ordering (decreasing word frequency, correct alphabetical order)
> was required?  Why should one contort a good test for allowing a kludge?
> 
> Any comments?

Unfortunately, changing the test will have absolutely no impact.
All you do is reverse the polarity of the frequencies by
storing the value

	1000000 - n

instead of n :)

Concatenating sort keys is just a Perlish implementation
of the usual lexicographical ordering of tuples.

Oh, and by the way, this 'trick' is the main (if not ONLY) 
argument for big-ending machines -- text and integers 
concatenated as binary sort correctly as binary.
(For extensibility and addressability little-endian
is clearly superior).