[Shootout-list] word-frequency
John Skaller
skaller@users.sourceforge.net
Mon, 20 Jun 2005 07:58:12 +1000
On Sat, 2005-06-18 at 11:48 +0200, Nicolas Neuss wrote:
> Hello,
>
> after Eric Lavigne and me wrote the CMUCL code for the word-frequency
> benchmark, I looked how some languages manage to be even shorter than our
> version. As much as I see, the main reason is that the requirements of the
> word frequency shootout are artificial (words have to be ordered in
> decreasing frequency AND reverse alphabetical order). This allows for the
> trick of sorting the strings obtained by concatenating word and frequency.
> IMO, this is a kludge because it only works for that unnatural alphabetical
> ordering.
>
> Question: wouldn't it be more telling about the power of a language, if the
> natural ordering (decreasing word frequency, correct alphabetical order)
> was required? Why should one contort a good test for allowing a kludge?
>
> Any comments?
Unfortunately, changing the test will have absolutely no impact.
All you do is reverse the polarity of the frequencies by
storing the value
1000000 - n
instead of n :)
Concatenating sort keys is just a Perlish implementation
of the usual lexicographical ordering of tuples.
Oh, and by the way, this 'trick' is the main (if not ONLY)
argument for big-ending machines -- text and integers
concatenated as binary sort correctly as binary.
(For extensibility and addressability little-endian
is clearly superior).