[Shootout-list] Perl k-nucleotide benchmark
Joel Hoffman
hoffmanj@pacifier.com
Fri, 25 Mar 2005 08:51:13 -0800
This is a multi-part message in MIME format.
--------------030106050006000906050706
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Joel Hoffman wrote:
> I'd like to contribute a simple Perl implementation of the
> k-nucleotide benchmark. It could probably be substantially improved,
> but it seems to work!
>
> Joel
... and here is a fixed version! (Well, two bugs squashed, anyway.)
Joel
--------------030106050006000906050706
Content-Type: application/x-perl;
name="nucleotides.pl"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="nucleotides.pl"
#!/usr/bin/perl
# The Great Computer Language Shootout
# http://shootout.alioth.debian.org/
# k-nucleotide count benchmark
# contributed by Joel Hoffman, 2005-03-24
use strict;
my @keys;
my $sequence = "";
$/ = ">"; # Input Record Separator
while (defined(my $l = <>)) {
if ($l =~ /^THREE/i) {
chomp $l;
my @l = split /\n+/, $l;
$sequence = uc join("",grep $_ !~ /^;/, @l[1..$#l]);
last;
}
}
for my $n (1,2,3,4,6,12,18) {
for my $i (0..(length $sequence) - $n) {
$keys[$n]{substr($sequence,$i,$n)}++;
}
}
for my $k (@keys[1,2]) {
printf "%s %4.2f\n",$_,(100 * $k->{$_})/(length $sequence)
for sort { $k->{$b} <=> $k->{$a} or $a cmp $b }
keys %$k;
print "\n";
}
printf "%d\t%s\n", $keys[length $_]{$_}, $_
for qw<GGT GGTA GGTATT GGTATTTTAATT GGTATTTTAATTTATAGT>;
--------------030106050006000906050706--