[Shootout-list] Perl k-nucleotide benchmark

Joel Hoffman hoffmanj@pacifier.com
Fri, 25 Mar 2005 08:51:13 -0800


This is a multi-part message in MIME format.
--------------030106050006000906050706
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Joel Hoffman wrote:

> I'd like to contribute a simple Perl implementation of the 
> k-nucleotide benchmark. It could probably be substantially improved, 
> but it seems to work!
>
> Joel


... and here is a fixed version! (Well, two bugs squashed, anyway.)

Joel


--------------030106050006000906050706
Content-Type: application/x-perl;
 name="nucleotides.pl"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="nucleotides.pl"

#!/usr/bin/perl

# The Great Computer Language Shootout
# http://shootout.alioth.debian.org/
# k-nucleotide count benchmark
# contributed by Joel Hoffman, 2005-03-24

use strict;

my @keys;
my $sequence = "";

$/ = ">"; # Input Record Separator
while (defined(my $l = <>)) {
   if ($l =~ /^THREE/i) {
      chomp $l;
      my @l = split /\n+/, $l;
      $sequence = uc join("",grep $_ !~ /^;/, @l[1..$#l]);
      last;
   }
}

for my $n (1,2,3,4,6,12,18) {
   for my $i (0..(length $sequence) - $n) {
      $keys[$n]{substr($sequence,$i,$n)}++;
   }
}

for my $k (@keys[1,2]) {
   printf "%s %4.2f\n",$_,(100 * $k->{$_})/(length $sequence)
     for sort { $k->{$b} <=> $k->{$a} or $a cmp $b }
       keys %$k;
   print "\n";
}

printf "%d\t%s\n", $keys[length $_]{$_}, $_
   for qw<GGT GGTA GGTATT GGTATTTTAATT GGTATTTTAATTTATAGT>;

--------------030106050006000906050706--