[Shootout-list] Perl k-nucleotide benchmark

Joel Hoffman hoffmanj@pacifier.com
Thu, 24 Mar 2005 20:48:31 -0800


This is a multi-part message in MIME format.
--------------060401040800090603010802
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Sorry, I meant to write from this address that is actually subscribed to 
the list. Please ignore my other one (from jahoffm@pdx.edu). Here it is 
again:

I'd like to contribute a simple Perl implementation of the k-nucleotide 
benchmark. It could probably be substantially improved, but it seems to 
work!

Joel

--------------060401040800090603010802
Content-Type: application/x-perl;
 name="nucleotides.pl"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="nucleotides.pl"

#!/usr/bin/perl

# The Great Computer Language Shootout
# http://shootout.alioth.debian.org/
# k-nucleotide count benchmark
# contributed by Joel Hoffman, 2005-03-24

use strict;

my @keys;
my $sequence = "";

$/ = "\n>"; # Input Record Separator
while (defined(my $l = <>)) {
   if (uc substr($l,0,5) eq 'THREE') {
      chomp $l;
      my @l = split /\n+/, $l;
      $sequence = uc join("",grep $_ !~ /^;/, @l[1..$#l]);
      last;
   }
}

for my $n (1,2,3,4,6,12,18) {
   for my $i (0..(length $sequence) - $n) {
      $keys[$n]{substr($sequence,$i,$n)}++;
   }
}

for my $i (1,2) {
   printf "%s %4.2f\n",$_,(100 * $keys[$i]{$_})/(length $sequence)
     for sort { $keys[$i]{$b} <=> $keys[$i]{$a} or $a cmp $b }
       keys %{$keys[$i]};
   print "\n";
}

printf "%d\t%s\n", $keys[length $_]{$_}, $_
   for qw<GGT GGTA GGTATT GGTATTTTAATT GGTATTTTAATTTATAGT>;

--------------060401040800090603010802--