r21114 - in /branches/upstream/libparse-mediawikidump-perl/current: ./ examples/ lib/Parse/ t/
gregoa at users.alioth.debian.org
gregoa at users.alioth.debian.org
Sat Jun 14 21:10:35 UTC 2008
Author: gregoa
Date: Sat Jun 14 21:10:35 2008
New Revision: 21114
URL: http://svn.debian.org/wsvn/?sc=1&rev=21114
Log:
[svn-upgrade] Integrating new upstream version, libparse-mediawikidump-perl (0.51)
Added:
branches/upstream/libparse-mediawikidump-perl/current/TODO
branches/upstream/libparse-mediawikidump-perl/current/examples/
branches/upstream/libparse-mediawikidump-perl/current/examples/speed_test (with props)
branches/upstream/libparse-mediawikidump-perl/current/t/links_test.sql
branches/upstream/libparse-mediawikidump-perl/current/t/pages_test.xml
Removed:
branches/upstream/libparse-mediawikidump-perl/current/links_test.sql
branches/upstream/libparse-mediawikidump-perl/current/pages_test.xml
branches/upstream/libparse-mediawikidump-perl/current/t/pod-coverage.t
branches/upstream/libparse-mediawikidump-perl/current/t/pod.t
Modified:
branches/upstream/libparse-mediawikidump-perl/current/Changes
branches/upstream/libparse-mediawikidump-perl/current/MANIFEST
branches/upstream/libparse-mediawikidump-perl/current/META.yml
branches/upstream/libparse-mediawikidump-perl/current/Makefile.PL
branches/upstream/libparse-mediawikidump-perl/current/lib/Parse/MediaWikiDump.pm
branches/upstream/libparse-mediawikidump-perl/current/t/links-compat.t
branches/upstream/libparse-mediawikidump-perl/current/t/links.t
branches/upstream/libparse-mediawikidump-perl/current/t/pages-compat.t
branches/upstream/libparse-mediawikidump-perl/current/t/pages.t
Modified: branches/upstream/libparse-mediawikidump-perl/current/Changes
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/Changes?rev=21114&op=diff
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/Changes (original)
+++ branches/upstream/libparse-mediawikidump-perl/current/Changes Sat Jun 14 21:10:35 2008
@@ -1,6 +1,18 @@
Revision history for Parse-MediaWikiDump
-0.40
- Jun 21, 2006
+
+0.51 May 31, 2008
+ * Fix for bug 36255 "Parse::MediaWikiDump::page::namespace may return
+ a string which is not really a namespace" provided by Amir E. Aharoni.
+ * Moved test data into t/ and moved speed_test.pl into examples/
+ * Exceedingly complicated functions (parse_head() and parse_page()) are
+ not funny. Added some comments on how to rectify that situation.
+ * Tightened up the tests a little bit.
+
+0.50 Jun 27, 2006
+ * Added category links parser.
+ * Removed all instances of shift() from the code.
+
+0.40 Jun 21, 2006
* Increased processing speed by around 40%!
0.33 Jun 18, 2006
Modified: branches/upstream/libparse-mediawikidump-perl/current/MANIFEST
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/MANIFEST?rev=21114&op=diff
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/MANIFEST (original)
+++ branches/upstream/libparse-mediawikidump-perl/current/MANIFEST Sat Jun 14 21:10:35 2008
@@ -1,15 +1,15 @@
-pages_test.xml
-links_test.sql
Changes
MANIFEST
META.yml # Will be created by "make dist"
Makefile.PL
README
+examples/speed_test
lib/Parse/MediaWikiDump.pm
t/00-load.t
-t/pod-coverage.t
-t/pod.t
t/pages.t
t/links.t
t/pages-compat.t
t/links-compat.t
+t/pages_test.xml
+t/links_test.sql
+TODO
Modified: branches/upstream/libparse-mediawikidump-perl/current/META.yml
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/META.yml?rev=21114&op=diff
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/META.yml (original)
+++ branches/upstream/libparse-mediawikidump-perl/current/META.yml Sat Jun 14 21:10:35 2008
@@ -1,10 +1,11 @@
# http://module-build.sourceforge.net/META-spec.html
#XXXXXXX This is a prototype!!! It will change in the future!!! XXXXX#
name: Parse-MediaWikiDump
-version: 0.40
+version: 0.51
version_from: lib/Parse/MediaWikiDump.pm
installdirs: site
requires:
+ List::Util: 0
Test::More: 0
XML::Parser: 0
Modified: branches/upstream/libparse-mediawikidump-perl/current/Makefile.PL
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/Makefile.PL?rev=21114&op=diff
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/Makefile.PL (original)
+++ branches/upstream/libparse-mediawikidump-perl/current/Makefile.PL Sat Jun 14 21:10:35 2008
@@ -9,6 +9,7 @@
ABSTRACT_FROM => 'lib/Parse/MediaWikiDump.pm',
PL_FILES => {},
PREREQ_PM => {
+ 'List::Util' => 0,
'Test::More' => 0,
'XML::Parser' => 0,
},
Added: branches/upstream/libparse-mediawikidump-perl/current/TODO
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/TODO?rev=21114&op=file
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/TODO (added)
+++ branches/upstream/libparse-mediawikidump-perl/current/TODO Sat Jun 14 21:10:35 2008
@@ -1,0 +1,4 @@
+ * Use a template system to integrate examples in the source distribution
+ right into the POD.
+ * Use a template system to perform automatic version information maintenance
+ across all the files that need version info.
Added: branches/upstream/libparse-mediawikidump-perl/current/examples/speed_test
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/examples/speed_test?rev=21114&op=file
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/examples/speed_test (added)
+++ branches/upstream/libparse-mediawikidump-perl/current/examples/speed_test Sat Jun 14 21:10:35 2008
@@ -1,0 +1,54 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+
+use Parse::MediaWikiDump;
+
+$SIG{ALRM} = \&progress;
+
+$| = 1;
+print '';
+
+my $i = 0;
+my $file = shift(@ARGV);
+
+my $num_iter = shift(@ARGV);
+$num_iter = 10 unless defined($num_iter);
+
+my $start = time;
+my $dump = undef;
+
+alarm(1);
+
+while($i++ < $num_iter) {
+ $start = time;
+
+ print "Iteration $i\r";
+ $dump = Parse::MediaWikiDump::Pages->new($file);
+
+ while($dump->next) { };
+
+ print "\n";
+}
+
+my @times = times;
+
+print $times[0] + $times[1], "\n";
+
+sub progress {
+ return unless defined($dump);
+ my $elapsed = time - $start;
+
+ $elapsed = 1 if $elapsed == 0;
+
+ print "Iteration $i: ";
+
+ print int($dump->current_byte / $dump->size * 100), "% ";
+
+ my $speed = int($dump->current_byte / $elapsed);
+
+ print $speed, " bytes per second \r";
+
+ alarm(1);
+}
Propchange: branches/upstream/libparse-mediawikidump-perl/current/examples/speed_test
------------------------------------------------------------------------------
svn:executable = *
Modified: branches/upstream/libparse-mediawikidump-perl/current/lib/Parse/MediaWikiDump.pm
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/lib/Parse/MediaWikiDump.pm?rev=21114&op=diff
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/lib/Parse/MediaWikiDump.pm (original)
+++ branches/upstream/libparse-mediawikidump-perl/current/lib/Parse/MediaWikiDump.pm Sat Jun 14 21:10:35 2008
@@ -1,9 +1,7 @@
package Parse::MediaWikiDump;
-our $VERSION = '0.40';
+our $VERSION = '0.51';
+
#the POD is at the end of this file
-#avoid shift() - it is computationally more expensive than pop
-#and shifting values for subroutine input should be avoided in
-#any subroutines that get called often, like the handlers
package Parse::MediaWikiDump::Pages;
@@ -16,22 +14,22 @@
use strict;
use warnings;
+use List::Util;
use XML::Parser;
#tokens in the buffer are an array ref with the 0th element specifying
#its type; these are the constants for those types.
sub new {
- my $class = shift;
- my $source = shift;
+ my ($class, $source) = @_;
my $self = {};
bless($self, $class);
$$self{PARSER} = XML::Parser->new(ProtocolEncoding => 'UTF-8');
$$self{PARSER}->setHandlers('Start', \&start_handler,
- 'End', \&end_handler);
- $$self{EXPAT} = $$self{PARSER}->parse_start(state => $self);
+ 'End', \&end_handler);
+ $$self{EXPAT} = $$self{PARSER}->parse_start(state => $self);
$$self{BUFFER} = [];
$$self{CHUNK_SIZE} = 32768;
$$self{BUF_LIMIT} = 10000;
@@ -45,7 +43,7 @@
}
sub next {
- my $self = shift;
+ my ($self) = @_;
my $buffer = $$self{BUFFER};
my $offset;
my @page;
@@ -75,9 +73,12 @@
#outputs a nicely formated representation of the tokens on the buffer specified
sub dump {
- my $self = shift;
- my $buffer = shift || $$self{BUFFER};
+ my ($self, $buffer) = @_;
my $offset = 0;
+
+ if (! defined($buffer)) {
+ $buffer = $$self{BUFFER};
+ }
foreach my $i (0 .. $#$buffer) {
my $token = $$buffer[$i];
@@ -118,37 +119,42 @@
}
sub sitename {
+ my ($self) = @_;
+ return $$self{HEAD}{sitename};
+}
+
+sub base {
+ my ($self) = @_;
+ return $$self{HEAD}{base};
+}
+
+sub generator {
+ my ($self) = @_;
+ return $$self{HEAD}{generator};
+}
+
+sub case {
+ my ($self) = @_;
+ return $$self{HEAD}{case};
+}
+
+sub namespaces {
+ my ($self) = @_;
+ return $$self{HEAD}{namespaces};
+}
+
+sub namespaces_names {
my $self = shift;
- return $$self{HEAD}{sitename};
-}
-
-sub base {
- my $self = shift;
- return $$self{HEAD}{base};
-}
-
-sub generator {
- my $self = shift;
- return $$self{HEAD}{generator};
-}
-
-sub case {
- my $self = shift;
- return $$self{HEAD}{case};
-}
-
-sub namespaces {
- my $self = shift;
- return $$self{HEAD}{namespaces};
+ return $$self{HEAD}{namespaces_names};
}
sub current_byte {
- my $self = shift;
+ my ($self) = @_;
return $$self{BYTE};
}
sub size {
- my $self = shift;
+ my ($self) = @_;
return undef unless defined $$self{SOURCE_FILE};
@@ -161,14 +167,13 @@
#replaced by next()
sub page {
- my $self = shift;
+ my ($self) = @_;
return $self->next(@_);
}
#private functions with OO interface
sub open {
- my $self = shift;
- my $source = shift;
+ my ($self, $source) = @_;
if (ref($source) eq 'GLOB') {
$$self{SOURCE} = $source;
@@ -186,7 +191,7 @@
}
sub init {
- my $self = shift;
+ my ($self) = @_;
my $offset;
my @head;
@@ -248,12 +253,16 @@
return -1;
}
-#this function is very frightning =)
+#this function is very frightning :-(
+#a better alternative would be to have each part of the stack handled by a
+#function that handles all the logic for that specific node in the tree
sub parse_head {
- my $self = shift;
- my $buffer = shift;
+ my ($self, $buffer) = @_;
my $state = 'start';
- my %data = (namespaces => []);
+ my %data = (
+ namespaces => [],
+ namespaces_names => [],
+ );
for (my $i = 0; $i <= $#$buffer; $i++) {
my $token = $$buffer[$i];
@@ -375,6 +384,7 @@
}
push(@{$data{namespaces}}, [$key, $name]);
+ push(@{$data{namespaces_names}}, $name);
$token = $$buffer[++$i];
@@ -408,10 +418,11 @@
return 1;
}
-#this function is very frightning =)
+#this function is very frightning :-(
+#see the parse_head function comments for thoughts on improving these
+#awful functions
sub parse_page {
- my $self = shift;
- my $buffer = shift;
+ my ($self, $buffer) = @_;
my %data;
my $state = 'start';
@@ -621,6 +632,18 @@
}
} else {
die "unknown state: $state";
+ }
+ }
+
+ $data{namespace} = '';
+ # Many pages just have a : in the title, but it's not necessary
+ # a namespace designation.
+ if ($data{title} =~ m/^([^:]+)\:/) {
+ my $possible_namespace = $1;
+ if (List::Util::first { $_ eq $possible_namespace }
+ @{ $self->namespaces_names() })
+ {
+ $data{namespace} = $possible_namespace;
}
}
@@ -647,7 +670,7 @@
}
sub token2text {
- my $token = shift;
+ my ($token) = @_;
if (ref $token eq 'ARRAY') {
return "<$$token[0]>";
@@ -674,9 +697,9 @@
sub start_handler {
my ($p, $tag, %atts) = @_;
my $self = $p->{state};
- my $good_tags = $self->{GOOD_TAGS};
-
- push @{ $self->{BUFFER} }, [$tag, \%atts];
+ my $good_tags = $$self{GOOD_TAGS};
+
+ push @{ $$self{BUFFER} }, [$tag, \%atts];
if (defined($good_tags->{$tag})) {
$p->setHandlers(Char => \&char_handler);
@@ -689,7 +712,7 @@
my ($p, $tag) = @_;
my $self = $p->{state};
- push @{ $self->{BUFFER} }, ["/$tag"];
+ push @{ $$self{BUFFER} }, ["/$tag"];
$p->setHandlers(Char => undef);
@@ -730,23 +753,13 @@
}
sub namespace {
- my $self = shift;
-
- return $$self{CACHE}{namespace} if defined($$self{CACHE}{namespace});
-
- my $title = $$self{DATA}{title};
-
- if ($title =~ m/^([^:]+)\:/) {
- $$self{CACHE}{namespace} = $1;
- return $1;
- } else {
- $$self{CACHE}{namespace} = '';
- return '';
- }
+ my ($self) = @_;
+
+ return $$self{DATA}{namespace};
}
sub categories {
- my $self = shift;
+ my ($self) = @_;
my $anchor = $$self{CATEGORY_ANCHOR};
return $$self{CACHE}{categories} if defined($$self{CACHE}{categories});
@@ -770,7 +783,7 @@
}
sub redirect {
- my $self = shift;
+ my ($self) = @_;
my $text = $$self{DATA}{text};
return $$self{CACHE}{redirect} if exists($$self{CACHE}{redirect});
@@ -785,42 +798,42 @@
}
sub title {
- my $self = shift;
+ my ($self) = @_;
return $$self{DATA}{title};
}
sub id {
- my $self = shift;
+ my ($self) = @_;
return $$self{DATA}{id};
}
sub revision_id {
- my $self = shift;
+ my ($self) = @_;
return $$self{DATA}{revision_id};
}
sub timestamp {
- my $self = shift;
+ my ($self) = @_;
return $$self{DATA}{timestamp};
}
sub username {
- my $self = shift;
+ my ($self) = @_;
return $$self{DATA}{username};
}
sub userid {
- my $self = shift;
+ my ($self) = @_;
return $$self{DATA}{userid};
}
sub minor {
- my $self = shift;
+ my ($self) = @_;
return $$self{DATA}{minor};
}
sub text {
- my $self = shift;
+ my ($self) = @_;
return $$self{DATA}{text};
}
@@ -830,8 +843,7 @@
use warnings;
sub new {
- my $class = shift;
- my $source = shift;
+ my ($class, $source) = @_;
my $self = {};
$$self{BUFFER} = [];
@@ -844,7 +856,7 @@
}
sub next {
- my $self = shift;
+ my ($self) = @_;
my $buffer = $$self{BUFFER};
my $link;
@@ -862,7 +874,7 @@
#private functions with OO interface
sub parse_more {
- my $self = shift;
+ my ($self) = @_;
my $source = $$self{SOURCE};
my $need_data = 1;
@@ -886,8 +898,7 @@
}
sub open {
- my $self = shift;
- my $source = shift;
+ my ($self, $source) = @_;
if (ref($source) ne 'GLOB') {
die "could not open $source: $!" unless
@@ -902,7 +913,7 @@
}
sub init {
- my $self = shift;
+ my ($self) = @_;
my $source = $$self{SOURCE};
my $found = 0;
@@ -920,7 +931,7 @@
#replaced by next()
sub link {
- my $self = shift;
+ my ($self) = @_;
$self->next(@_);
}
@@ -928,8 +939,7 @@
#you must pass in a fully populated link array reference
sub new {
- my $class = shift;
- my $self = shift;
+ my ($class, $self) = @_;
bless($self, $class);
@@ -937,21 +947,305 @@
}
sub from {
- my $self = shift;
+ my ($self) = @_;
return $$self[0];
}
sub namespace {
- my $self = shift;
+ my ($self) = @_;
return $$self[1];
}
sub to {
- my $self = shift;
+ my ($self) = @_;
return $$self[2];
}
-
+package Parse::MediaWikiDump::CategoryLinks;
+
+use strict;
+use warnings;
+
+sub new {
+ my ($class, $source) = @_;
+ my $self = {};
+
+ $$self{BUFFER} = [];
+ $$self{BYTE} = 0;
+
+ bless($self, $class);
+
+ $self->open($source);
+ $self->init;
+
+ return $self;
+}
+
+sub next {
+ my ($self) = @_;
+ my $buffer = $$self{BUFFER};
+ my $link;
+
+ while(1) {
+ if (defined($link = pop(@$buffer))) {
+ last;
+ }
+
+ #signals end of input
+ return undef unless $self->parse_more;
+ }
+
+ return Parse::MediaWikiDump::category_link->new($link);
+}
+
+#private functions with OO interface
+sub parse_more {
+ my ($self) = @_;
+ my $source = $$self{SOURCE};
+ my $need_data = 1;
+
+ while($need_data) {
+ my $line = <$source>;
+
+ last unless defined($line);
+
+ $$self{BYTE} += length($line);
+
+ while($line =~ m/\((\d+),'(.*?)','(.*?)',(\d+)\)[;,]/g) {
+ push(@{$$self{BUFFER}}, [$1, $2, $3, $4]);
+ $need_data = 0;
+ }
+ }
+
+ #if we still need data and we are here it means we ran out of input
+ if ($need_data) {
+ return 0;
+ }
+
+ return 1;
+}
+
+sub open {
+ my ($self, $source) = @_;
+
+ if (ref($source) ne 'GLOB') {
+ die "could not open $source: $!" unless
+ open($$self{SOURCE}, $source);
+
+ $$self{SOURCE_FILE} = $source;
+ } else {
+ $$self{SOURCE} = $source;
+ }
+
+ binmode($$self{SOURCE}, ':utf8');
+
+ return 1;
+}
+
+sub init {
+ my ($self) = @_;
+ my $source = $$self{SOURCE};
+ my $found = 0;
+
+ while(<$source>) {
+ if (m/^LOCK TABLES `categorylinks` WRITE;/) {
+ $found = 1;
+ last;
+ }
+ }
+
+ die "not a MediaWiki link dump file" unless $found;
+}
+
+sub current_byte {
+ my ($self) = @_;
+
+ return $$self{BYTE};
+}
+
+sub size {
+ my ($self) = @_;
+
+ return undef unless defined $$self{SOURCE_FILE};
+
+ my @stat = stat($$self{SOURCE_FILE});
+
+ return $stat[7];
+}
+
+package Parse::MediaWikiDump::category_link;
+
+#you must pass in a fully populated link array reference
+sub new {
+ my ($class, $self) = @_;
+
+ bless($self, $class);
+
+ return $self;
+}
+
+sub from {
+ my ($self) = @_;
+ return $$self[0];
+}
+
+sub to {
+ my ($self) = @_;
+ return $$self[1];
+}
+
+sub sortkey {
+ my ($self) = @_;
+ return $$self[2];
+}
+
+sub timestamp {
+ my ($self) = @_;
+ return $$self[3];
+}
+
+#package Parse::MediaWikiDump::ExternalLinks;
+#
+#use strict;
+#use warnings;
+#
+#sub new {
+# my ($class, $source) = @_;
+# my $self = {};
+#
+# $$self{BUFFER} = [];
+# $$self{BYTE} = 0;
+#
+# bless($self, $class);
+#
+# $self->open($source);
+# $self->init;
+#
+# return $self;
+#}
+#
+#sub next {
+# my ($self) = @_;
+# my $buffer = $$self{BUFFER};
+# my $link;
+#
+# while(1) {
+# if (defined($link = pop(@$buffer))) {
+# last;
+# }
+#
+# #signals end of input
+# return undef unless $self->parse_more;
+# }
+#
+# return Parse::MediaWikiDump::external_link->new($link);
+#}
+#
+##private functions with OO interface
+#sub parse_more {
+# my ($self) = @_;
+# my $source = $$self{SOURCE};
+# my $need_data = 1;
+#
+# while($need_data) {
+# my $line = <$source>;
+#
+# last unless defined($line);
+#
+# $$self{BYTE} += length($line);
+#
+# while($line =~ m/\((\d+),'(.*?)','(.*?)'\)[;,]/g) {
+# push(@{$$self{BUFFER}}, [$1, $2, $3]);
+# $need_data = 0;
+# }
+# }
+#
+# #if we still need data and we are here it means we ran out of input
+# if ($need_data) {
+# return 0;
+# }
+#
+# return 1;
+#}
+#
+#sub open {
+# my ($self, $source) = @_;
+#
+# if (ref($source) ne 'GLOB') {
+# die "could not open $source: $!" unless
+# open($$self{SOURCE}, $source);
+#
+# $$self{SOURCE_FILE} = $source;
+# } else {
+# $$self{SOURCE} = $source;
+# }
+#
+# binmode($$self{SOURCE}, ':utf8');
+#
+# return 1;
+#}
+#
+#sub init {
+# my ($self) = @_;
+# my $source = $$self{SOURCE};
+# my $found = 0;
+#
+# while(<$source>) {
+# if (m/^LOCK TABLES `externallinks` WRITE;/) {
+# $found = 1;
+# last;
+# }
+# }
+#
+# die "not a MediaWiki link dump file" unless $found;
+#}
+#
+#sub current_byte {
+# my ($self) = @_;
+#
+# return $$self{BYTE};
+#}
+#
+#sub size {
+# my ($self) = @_;
+#
+# return undef unless defined $$self{SOURCE_FILE};
+#
+# my @stat = stat($$self{SOURCE_FILE});
+#
+# return $stat[7];
+#}
+#
+#package Parse::MediaWikiDump::external_link;
+#
+##you must pass in a fully populated link array reference
+#sub new {
+# my ($class, $self) = @_;
+#
+# bless($self, $class);
+#
+# return $self;
+#}
+#
+#sub from {
+# my ($self) = @_;
+# return $$self[0];
+#}
+#
+#sub to {
+# my ($self) = @_;
+# return $$self[1];
+#}
+#
+#sub index {
+# my ($self) = @_;
+# return $$self[2];
+#}
+#
+#sub timestamp {
+# my ($self) = @_;
+# return $$self[3];
+#
1;
__END__
@@ -985,6 +1279,7 @@
$pages->generator;
$pages->case;
$pages->namespaces;
+ $pages->namespaces_names;
$pages->current_byte;
$pages->size;
@@ -1081,6 +1376,16 @@
namespace number and the second is the namespace name. In the case of namespace
0 the text stored for the name is ''.
+=item $pages->namespaces_names
+
+Returns an array reference to a list of namspace names only; this is a single
+dimensional array with plain text string values.
+
+=item $pages->namespaces
+
+Returns an array reference to the list of namespaces names in the instance,
+without namespaces numbers. Main namespace name is ''.
+
=item $pages->current_byte
Returns the number of bytes parsed so far.
@@ -1395,7 +1700,11 @@
=head1 AUTHOR
-This module was created and documented by Tyler Riddle E<lt>triddle at gmail.comE<gt>.
+This module was created, documented, and is maintained by
+Tyler Riddle E<lt>triddle at gmail.comE<gt>.
+
+Fix for bug 36255 "Parse::MediaWikiDump::page::namespace may return a string
+which is not really a namespace" provided by Amir E. Aharoni.
=head1 BUGS
Modified: branches/upstream/libparse-mediawikidump-perl/current/t/links-compat.t
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/t/links-compat.t?rev=21114&op=diff
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/t/links-compat.t (original)
+++ branches/upstream/libparse-mediawikidump-perl/current/t/links-compat.t Sat Jun 14 21:10:35 2008
@@ -5,7 +5,7 @@
use warnings;
use Parse::MediaWikiDump;
-my $file = 'links_test.sql';
+my $file = 't/links_test.sql';
my $links = Parse::MediaWikiDump::Links->new($file);
Modified: branches/upstream/libparse-mediawikidump-perl/current/t/links.t
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/t/links.t?rev=21114&op=diff
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/t/links.t (original)
+++ branches/upstream/libparse-mediawikidump-perl/current/t/links.t Sat Jun 14 21:10:35 2008
@@ -5,7 +5,7 @@
use warnings;
use Parse::MediaWikiDump;
-my $file = 'links_test.sql';
+my $file = 't/links_test.sql';
my $links = Parse::MediaWikiDump::Links->new($file);
Added: branches/upstream/libparse-mediawikidump-perl/current/t/links_test.sql
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/t/links_test.sql?rev=21114&op=file
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/t/links_test.sql (added)
+++ branches/upstream/libparse-mediawikidump-perl/current/t/links_test.sql Sat Jun 14 21:10:35 2008
@@ -1,0 +1,27 @@
+-- MySQL dump 9.11
+--
+-- Host: benet Database: simplewiki
+-- ------------------------------------------------------
+-- Server version 4.0.22-log
+
+--
+-- Table structure for table `pagelinks`
+--
+
+DROP TABLE IF EXISTS `pagelinks`;
+CREATE TABLE `pagelinks` (
+ `pl_from` int(8) unsigned NOT NULL default '0',
+ `pl_namespace` int(11) NOT NULL default '0',
+ `pl_title` varchar(255) binary NOT NULL default '',
+ UNIQUE KEY `pl_from` (`pl_from`,`pl_namespace`,`pl_title`),
+ KEY `pl_namespace` (`pl_namespace`,`pl_title`)
+) TYPE=InnoDB;
+
+--
+-- Dumping data for table `pagelinks`
+--
+
+
+/*!40000 ALTER TABLE `pagelinks` DISABLE KEYS */;
+LOCK TABLES `pagelinks` WRITE;
+INSERT INTO `pagelinks` VALUES (7759,-1,'Recentchanges'),(4016,0,'\"Captain\"_Lou_Albano'),(7491,0,'\"Captain\"_Lou_Albano'),(9935,0,'\"Dimebag\"_Darrell'),(7617,0,'\"Hawkeye\"_Pierce'),(1495,0,'$1'),(1495,0,'$2'),(4901,0,'\',_art_title,_\''),(4376,0,'\'Abd_Al-Rahman_Al_Sufi'),(12418,0,'\'Allo_\'Allo!'),(4045,0,'\'Newton\'s_cradle\'_toy'),(4045,0,'\'Push-and-go\'_toy_car'),(7794,0,'\'Salem\'s_Lot'),(4670,0,'(2340_Hathor'),(1876,0,'(Mt.'),(4400,0,'(c)Brain'),(3955,0,'...Baby_One_More_Time_(single)');
Modified: branches/upstream/libparse-mediawikidump-perl/current/t/pages-compat.t
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/t/pages-compat.t?rev=21114&op=diff
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/t/pages-compat.t (original)
+++ branches/upstream/libparse-mediawikidump-perl/current/t/pages-compat.t Sat Jun 14 21:10:35 2008
@@ -4,7 +4,7 @@
use strict;
use Parse::MediaWikiDump;
-my $file = 'pages_test.xml';
+my $file = 't/pages_test.xml';
my $fh;
test_all($file);
Modified: branches/upstream/libparse-mediawikidump-perl/current/t/pages.t
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/t/pages.t?rev=21114&op=diff
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/t/pages.t (original)
+++ branches/upstream/libparse-mediawikidump-perl/current/t/pages.t Sat Jun 14 21:10:35 2008
@@ -1,10 +1,10 @@
#!perl -w
-use Test::Simple tests => 46;
+use Test::Simple tests => 74;
use strict;
use Parse::MediaWikiDump;
-my $file = 'pages_test.xml';
+my $file = 't/pages_test.xml';
my $fh;
my $pages;
@@ -20,12 +20,18 @@
test_one();
test_two();
test_three();
+ test_four();
+
+ ok(! defined($pages->next));
}
sub test_one {
my $page = $pages->next;
my $text = $page->text;
+ ok(defined($page));
+
+ ok($page->namespace eq '');
ok($pages->sitename eq 'Sitename Test Value');
ok($pages->base eq 'Base Test Value');
ok($pages->generator eq 'Generator Test Value');
@@ -42,6 +48,8 @@
sub test_two {
my $page = $pages->next;
+ ok(defined($page));
+ ok($page->namespace eq '');
ok($page->redirect eq 'fooooo');
ok($page->title eq 'Title Test Value #2');
ok($page->id == 2);
@@ -53,6 +61,8 @@
sub test_three {
my $page = $pages->next;
+ ok(defined($page));
+ ok($page->namespace eq '');
ok($page->redirect eq 'fooooo');
ok($page->title eq 'Title Test Value #3');
ok($page->id == 3);
@@ -60,3 +70,18 @@
ok($page->username eq 'Username Test Value');
ok($page->userid == 1292);
}
+
+sub test_four {
+ my $page = $pages->next;
+
+ ok(defined($page));
+
+ ok($page->id == 4);
+ ok($page->timestamp eq '2005-07-09T18:41:10Z');
+ ok($page->username eq 'Username Test Value');
+ ok($page->userid == 1292);
+
+ #test for bug 36255
+ ok($page->namespace eq '');
+ ok($page->title eq 'NotANameSpace:Bar');
+}
Added: branches/upstream/libparse-mediawikidump-perl/current/t/pages_test.xml
URL: http://svn.debian.org/wsvn/branches/upstream/libparse-mediawikidump-perl/current/t/pages_test.xml?rev=21114&op=file
==============================================================================
--- branches/upstream/libparse-mediawikidump-perl/current/t/pages_test.xml (added)
+++ branches/upstream/libparse-mediawikidump-perl/current/t/pages_test.xml Sat Jun 14 21:10:35 2008
@@ -1,0 +1,87 @@
+<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="simple">
+<siteinfo>
+ <sitename>Sitename Test Value</sitename>
+ <base>Base Test Value</base>
+ <generator>Generator Test Value</generator>
+ <case>Case Test Value</case>
+ <namespaces>
+ <namespace key="-2">Media</namespace>
+ <namespace key="-1">Special</namespace>
+ <namespace key="0" />
+ <namespace key="1">Talk</namespace>
+ <namespace key="2">User</namespace>
+ <namespace key="3">User talk</namespace>
+ <namespace key="4">Wikipedia</namespace>
+ <namespace key="5">Wikipedia talk</namespace>
+ <namespace key="6">Image</namespace>
+ <namespace key="7">Image talk</namespace>
+ <namespace key="8">MediaWiki</namespace>
+ <namespace key="9">MediaWiki talk</namespace>
+ <namespace key="10">Template</namespace>
+ <namespace key="11">Template talk</namespace>
+ <namespace key="12">Help</namespace>
+ <namespace key="13">Help talk</namespace>
+ <namespace key="14">Category</namespace>
+ <namespace key="15">Category talk</namespace>
+ </namespaces>
+</siteinfo>
+<page>
+ <title>Title Test Value</title>
+ <id>1</id>
+ <revision>
+ <id>47084</id>
+ <timestamp>2005-07-09T18:41:10Z</timestamp>
+ <contributor><username>Username Test Value</username><id>1292</id></contributor>
+ <minor/>
+ <comment>Comment Test Value</comment>
+ <text xml:space="preserve">Text Test Value
+</text>
+ </revision>
+</page>
+
+<page>
+ <title>Title Test Value #2</title>
+ <id>2</id>
+ <revision>
+ <id>47085</id>
+ <timestamp>2005-07-09T18:41:10Z</timestamp>
+ <contributor><username>Username Test Value</username><id>1292</id></contributor>
+ <minor/>
+ <comment>Comment Test Value</comment>
+ <text xml:space="preserve">#redirect : [[fooooo]]
+</text>
+ </revision>
+</page>
+
+<page>
+ <title>Title Test Value #3</title>
+ <id>3</id>
+ <revision>
+ <id>47086</id>
+ <timestamp>2005-07-09T18:41:10Z</timestamp>
+ <contributor><username>Username Test Value</username><id>1292</id></contributor>
+ <minor/>
+ <comment>Comment Test Value</comment>
+ <text xml:space="preserve">#redirect [[fooooo]]
+</text>
+ </revision>
+</page>
+
+<page>
+ <title>NotANameSpace:Bar</title>
+ <id>4</id>
+ <revision>
+ <id>47088</id>
+ <timestamp>2005-07-09T18:41:10Z</timestamp>
+ <contributor><username>Username Test Value</username><id>1292</id></contributor>
+ <minor/>
+ <comment>Comment Test Value</comment>
+ <text xml:space="preserve">
+ test for bug #36255 -
+ Parse::MediaWikiDump::page::namespace may return a string
+ which is not really a namespace
+ </text>
+ </revision>
+</page>
+
+</mediawiki>
More information about the Pkg-perl-cvs-commits
mailing list