[liburi-find-perl] 15/31: new upstream

gregor herrmann gregoa at debian.org
Sat Jul 25 17:41:48 UTC 2015


This is an automated email from the git hooks/post-receive script.

gregoa pushed a commit to branch master
in repository liburi-find-perl.

commit 69fbe67c7f43abf0ab40535ea577d76707150793
Author: Dominic Hargreaves <dom at earth.li>
Date:   Sun Jun 13 20:10:50 2010 +0000

    new upstream
---
 Build.PL                   |   4 +-
 Changes                    |  28 ++++
 MANIFEST                   |   5 +
 MANIFEST.SKIP              |   1 +
 META.yml                   |  42 +++---
 SIGNATURE                  |  33 +++--
 bin/urifind                | 317 +++++++++++++++++++++++++++++++++++++++++++++
 debian/changelog           |   6 +
 lib/URI/Find.pm            |  49 +++++--
 lib/URI/Find/Schemeless.pm |   6 +-
 t/Find.t                   |   3 +
 t/html.t                   |  38 ++++++
 t/rfc3986_appendix_c.t     |   9 +-
 t/urifind/find.t           |  44 +++++++
 t/urifind/pod.t            |  17 +++
 t/urifind/sciencenews      |  95 ++++++++++++++
 16 files changed, 642 insertions(+), 55 deletions(-)

diff --git a/Build.PL b/Build.PL
index 9136b16..b10c6e1 100644
--- a/Build.PL
+++ b/Build.PL
@@ -32,7 +32,9 @@ my $build = Module::Build->new(
             bugtracker      => 'http://rt.cpan.org/Public/Dist/Display.html?Name=URI-Find',
             repository      => 'http://github.com/schwern/uri-find/tree/master',
         }
-    }
+    },
+
+    recursive_test_files        => 1,
 );
 
 $build->create_build_script;
diff --git a/Changes b/Changes
index 78847da..494050e 100644
--- a/Changes
+++ b/Changes
@@ -1,5 +1,33 @@
 Revision history for Perl module URI::Find.
 
+20100505  Wed May  5 18:48:44 PDT 2010
+    Test Fixes
+    * Fixed t/urifind/find.t on Windows
+
+
+20100504.1039  Tue May  4 10:39:23 PDT 2010
+    Doc Fixes
+    * Forgot to mention that we ship with urifind now.
+
+
+20100504  Tue May  4 10:29:52 PDT 2010
+    New Features
+    * Added a urifind program. (Darren Chamberlain)
+    Bug Fixes
+    * The final semi-colon was being strippped form URLs found in HTML
+      that ended with HTML entities. (Michael Peters)
+      Example: http://google.com/search?q=<html>
+    * URLs with leading dots, pluses and minuses are now found.
+      [rt.cpan.org 57032]
+      Example: stuff...http://example.com
+
+
+20100211  Thu Feb 11 04:02:26 PST 2010
+    Bug Fixes
+    * Finding URIs inside brackets was pretty badly broken by
+      the last release.  (Michael Peters)
+
+
 20090319  Thu Mar 19 12:17:53 PDT 2009
     Bug Fixes
     * Schemeless now ignores the case of the TLD.
diff --git a/MANIFEST b/MANIFEST
index d0a4422..7ad30c0 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -1,3 +1,4 @@
+bin/urifind
 Build.PL
 Changes
 INSTALL
@@ -9,8 +10,12 @@ META.yml
 README
 t/filter.t
 t/Find.t
+t/html.t
 t/is_schemed.t
 t/load-schemeless.t
 t/rfc3986_appendix_c.t
+t/urifind/find.t
+t/urifind/pod.t
+t/urifind/sciencenews
 TODO
 SIGNATURE    Added here by Module::Build
diff --git a/MANIFEST.SKIP b/MANIFEST.SKIP
index 9045d98..bf2fbf0 100644
--- a/MANIFEST.SKIP
+++ b/MANIFEST.SKIP
@@ -34,3 +34,4 @@
 \bcover_db\b
 #!end included /usr/local/perl/5.10.0/lib/5.10.0/ExtUtils/MANIFEST.SKIP
 
+^MYMETA.yml$
diff --git a/META.yml b/META.yml
index 643a5d0..b15e1aa 100644
--- a/META.yml
+++ b/META.yml
@@ -1,32 +1,32 @@
 ---
-name: URI-Find
-version: 20090319
+abstract: 'Find URIs in arbitrary text'
 author:
   - 'Michael G Schwern <schwern at pobox.com>'
-abstract: Find URIs in arbitrary text
-license: perl
-resources:
-  bugtracker: http://rt.cpan.org/Public/Dist/Display.html?Name=URI-Find
-  homepage: http://search.cpan.org/dist/URI-Find
-  license: http://dev.perl.org/licenses/
-  repository: http://github.com/schwern/uri-find/tree/master
-configure_requires:
-  Module::Build: 0.30
-requires:
-  URI: 1.00
-  URI::URL: 5.00
-  perl: 5.6.0
 build_requires:
   Module::Build: 0.30
   Test::More: 0.82
+configure_requires:
+  Module::Build: 0.30
+generated_by: 'Module::Build version 0.3607'
+license: perl
+meta-spec:
+  url: http://module-build.sourceforge.net/META-spec-v1.4.html
+  version: 1.4
+name: URI-Find
 provides:
   URI::Find:
     file: lib/URI/Find.pm
-    version: 20090319
+    version: 20100505
   URI::Find::Schemeless:
     file: lib/URI/Find/Schemeless.pm
-    version: 20090319
-generated_by: Module::Build version 0.3103
-meta-spec:
-  url: http://module-build.sourceforge.net/META-spec-v1.2.html
-  version: 1.2
+    version: 20100505
+requires:
+  URI: 1.00
+  URI::URL: 5.00
+  perl: v5.6.0
+resources:
+  bugtracker: http://rt.cpan.org/Public/Dist/Display.html?Name=URI-Find
+  homepage: http://search.cpan.org/dist/URI-Find
+  license: http://dev.perl.org/licenses/
+  repository: http://github.com/schwern/uri-find/tree/master
+version: 20100505
diff --git a/SIGNATURE b/SIGNATURE
index 574120f..b201588 100644
--- a/SIGNATURE
+++ b/SIGNATURE
@@ -1,5 +1,5 @@
 This file contains message digests of all files listed in MANIFEST,
-signed via the Module::Signature module, version 0.55.
+signed via the Module::Signature module, version 0.61.
 
 To verify the content in this distribution, first make sure you have
 Module::Signature installed, then type:
@@ -14,25 +14,30 @@ not run its Makefile.PL or Build.PL.
 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1
 
-SHA1 1a0c5437cbca894673a84ae33b0455d40cf32a24 Build.PL
-SHA1 34762762a1afa939862f8f35787cf81e01b9a66a Changes
+SHA1 6057930ea78bb75e1c5d52b0a6f164dd667e81b2 Build.PL
+SHA1 b0f8e3b251fb4ccd7864332ddcf9dc54efbd5a4a Changes
 SHA1 3ca0307f0585442c160041b8d8f3472359735108 INSTALL
-SHA1 496f20eefdeee2b954fca6e57442ce984f1999a4 MANIFEST
-SHA1 c4c081b812afdd274d037fe04ba8776c0b831caa MANIFEST.SKIP
-SHA1 286609951d54a65f2f1dbf0b15a2114b946482d7 META.yml
+SHA1 85c3079f3b3525ac33fecc350a23960a624bd6c6 MANIFEST
+SHA1 b58525ce40ec4c85489b4791aa8953fc4b62d5a3 MANIFEST.SKIP
+SHA1 66e3accf3332bc65e0fce615525b5b0f42fec9e0 META.yml
 SHA1 fc72946137d28d945b8c7027b4a49a001ec49cd1 README
 SHA1 7dc0589de524cbd4c983c5cd6e9da58fd474b34a TODO
-SHA1 6d32866bbd081815ec428dcfa01913885da57912 lib/URI/Find.pm
-SHA1 a26bfc871298a5936275722c66dd6611822aebd2 lib/URI/Find/Schemeless.pm
-SHA1 b3b2e47b718a06f48d668ad929818669696c649b t/Find.t
+SHA1 6c25e91b26a506217cac6cb499db78543ad02e29 bin/urifind
+SHA1 6bea554311a1a1c65825b63e6f2feab214f94130 lib/URI/Find.pm
+SHA1 569c655e04e33a10e5424b340ea85c276c772a5a lib/URI/Find/Schemeless.pm
+SHA1 0e60f3713dc4b6ae6fb24e3ce3fb29921e8c3e0c t/Find.t
 SHA1 20f14cb0c5b625cc2040183426c97f0cfad7e148 t/filter.t
+SHA1 1071febaa25419c5cdb2580bf87ee81834e70132 t/html.t
 SHA1 2c057ac42eb47f6b7da78c3b7ebb20b94f33e719 t/is_schemed.t
 SHA1 a0fdf62d822e769d80b229bb88f1a013f6ab0964 t/load-schemeless.t
-SHA1 2ce9adca5502a9a3aa330594d627e99c1d93623f t/rfc3986_appendix_c.t
+SHA1 0c5eb1bda18407bdf26b8831a08cad4a14938082 t/rfc3986_appendix_c.t
+SHA1 6aaf29926da83d3c369cf28969bd6c48df8deff4 t/urifind/find.t
+SHA1 9ff9e4e6fef205eebead0f792da79dedd61b4b7e t/urifind/pod.t
+SHA1 0cb2627de8403934f9893ed2e86145e7c372402c t/urifind/sciencenews
 -----BEGIN PGP SIGNATURE-----
-Version: GnuPG v1.4.9 (Darwin)
+Version: GnuPG v2.0.13 (Darwin)
 
-iEYEARECAAYFAknCm/EACgkQWMohlhD1QycoagCbB8DV2gDdWW1wMPTgVbGKgiMd
-fmMAn2LtDyop0OwyPzLpNM3VKv+unpoY
-=1MtC
+iEYEARECAAYFAkviICwACgkQWMohlhD1Qyd6RQCfRJPEJmwvs3AMmHFNdKgBXf+W
+KiEAn0ghpQ1h1af29aMwazWXcr6of5UF
+=AiB8
 -----END PGP SIGNATURE-----
diff --git a/bin/urifind b/bin/urifind
new file mode 100644
index 0000000..fcc381f
--- /dev/null
+++ b/bin/urifind
@@ -0,0 +1,317 @@
+#!/usr/local/bin/perl -w
+
+# ----------------------------------------------------------------------
+# urifind - find URIs in a document and dump them to STDOUT.
+# Copyright (C) 2003 darren chamberlain <darren at cpan.org>
+# ----------------------------------------------------------------------
+
+use strict;
+
+our $VERSION = 20100505;
+
+use File::Basename qw(basename);
+use Getopt::Long qw(GetOptions);
+use IO::File;
+use URI::Find;
+
+# What to do, and how
+my $help = 0;
+my $version = 0;
+my $sort = 0;
+my $reverse = 0;
+my $unique = 0;
+my $prefix = 0;
+my $noprefix = 0;
+my @pats = ();
+my @schemes = ();
+my $dump = 0;
+
+Getopt::Long::Configure(qw{no_ignore_case bundling});
+GetOptions('s!'   => \$sort,
+           'u!'   => \$unique,
+           'p!'   => \$prefix,
+           'n!'   => \$noprefix,
+           'r!'   => \$reverse,
+           'h!'   => \$help,
+           'v!'   => \$version,
+           'd!'   => sub { $dump = 1 },
+           'D!'   => sub { $dump = 2 },
+           'P=s@' => \@pats,
+           'S=s@' => \@schemes);
+
+if ($help || $version) {
+    my $prog = basename($0);
+
+    if ($help) {
+        print <<HELP;
+$prog - find URIs in a document and dump them to STDOUT.
+
+    $prog [OPTIONS] file1 [file2[, file3[, ...]]]
+
+Options:
+
+    -s          Sort results.
+    -r          Reverse sort results (implies -s).
+    -u          Return unique results only.
+    -n          Don't include filename in output.
+    -p          Include filename in output (0 by default, but 1 if
+                multiple files are included on the command line).
+    -P \$re      Print only lines matching regex '\$re' 
+                (may be specified multiple times).
+    -S \$scheme  Only this scheme (may be specified multiple times).
+    -h          This help screen.
+    -v          Display version and exit.
+    -d          Dump compiled regexes and continue.
+    -D          Dump compiled regexes and exit.
+
+HELP
+    }
+    else {
+        printf "$prog v.%.02f\n", $VERSION;
+    }
+
+    exit(0);
+}
+
+my (@uris, $count);
+unshift @ARGV, \*STDIN unless @ARGV;
+
+if (($prefix + $noprefix) > 1) {
+    my $prog = basename $0;
+    die "Can't specify -p and -n at the same time; try $prog -h\n";
+}
+
+
+# Print filename with matches?  -p / -n
+# If there is more than one file, then show filenames by
+# default, unless explicitly asked not to (-n)
+if (@ARGV > 1) {
+    $prefix = 1 unless $noprefix;
+}
+else {
+    $prefix = 0 unless $prefix;
+}
+
+# Add schemes to the list of regexen
+if (@schemes) {
+    unshift @pats => sprintf '^(\b%s\b):' => join '\b|\b' => @schemes;
+}
+
+# If we are dumping (-d, -D), then dump.  Exit if -D.
+if ($dump) {
+    print STDERR "\$scheme = '" . (defined $pats[0] ? $pats[0] : '') . "'\n";
+    print STDERR "\@pats = ('" . join("', '", @pats) . "')\n";
+    exit if $dump == 2;
+}
+
+# Find the URIs
+for my $argv (@ARGV) {
+    my ($name, $fh, $data);
+
+    $argv = \*STDIN if ($argv eq '-');
+
+    if (ref $argv eq 'GLOB') {
+        local $/;
+        $data = <$argv>;
+        $name = '<stdin>'
+    }
+    else {
+        local $/;
+        $fh = IO::File->new($argv) or die "Can't open $argv: $!";
+        $data = <$fh>;
+        $name = $argv;
+    }
+
+    my $finder = URI::Find->new(sub { push @uris => [ $name, $_[0] ] });
+    $finder->find(\$data);
+}
+
+# Apply patterns, in @pats
+for my $pat (@pats) {
+    @uris = grep { $_->[1] =~ /$pat/ } @uris;
+}
+
+# Remove redundant links
+if ($unique) {
+    my %unique;
+    @uris = grep { ++$unique{$_->[1]} == 1 } @uris;
+}
+
+# Sort links, possibly in reverse
+if ($sort || $reverse) {
+    if ($reverse) {
+        @uris = sort { $b->[1] cmp $a->[1] } @uris;
+    }
+    else {
+        @uris = sort { $a->[1] cmp $b->[1] } @uris;
+    }
+}
+
+# Flatten the arrayrefs
+if ($prefix) {
+    @uris = map { join ': ' => @$_ } @uris;
+}
+else {
+    @uris = map { $_->[1] } @uris;
+}
+
+print map { "$_\n" } @uris;
+
+exit 0;
+
+__END__
+
+=head1 NAME
+
+urifind - find URIs in a document and dump them to STDOUT.
+
+=head1 SYNOPSIS
+
+    $ urifind file
+
+=head1 DESCRIPTION
+
+F<urifind> is a simple script that finds URIs in one or more files
+(using C<URI::Find>), and outputs them to to STDOUT.  That's it.
+
+To find all the URIs in F<file1>, use:
+
+    $ urifind file1
+
+To find the URIs in multiple files, simply list them as arguments:
+
+    $ urifind file1 file2 file3
+
+F<urifind> will read from C<STDIN> if no files are given or if a
+filename of C<-> is specified:
+
+    $ wget http://www.boston.com/ -O - | urifind
+
+When multiple files are listed, F<urifind> prefixes each found URI
+with the file from which it came:
+
+    $ urifind file1 file2
+    file1: http://www.boston.com/index.html
+    file2: http://use.perl.org/
+
+This can be turned on for single files with the C<-p> ("prefix") switch:
+
+    $urifind -p file3
+    file1: http://fsck.com/rt/
+
+It can also be turned off for multiple files with the C<-n> ("no
+prefix") switch:
+
+    $ urifind -n file1 file2
+    http://www.boston.com/index.html
+    http://use.perl.org/
+
+By default, URIs will be displayed in the order found; to sort them
+ascii-betically, use the C<-s> ("sort") option.  To reverse sort them,
+use the C<-r> ("reverse") flag (C<-r> implies C<-s>).
+
+    $ urifind -s file1 file2
+    http://use.perl.org/
+    http://www.boston.com/index.html
+    mailto:webmaster at boston.com
+
+    $ urifind -r file1 file2
+    mailto:webmaster at boston.com
+    http://www.boston.com/index.html
+    http://use.perl.org/
+
+Finally, F<urifind> supports limiting the returned URIs by scheme or
+by arbitrary pattern, using the C<-S> option (for schemes) and the
+C<-P> option.  Both C<-S> and C<-P> can be specified multiple times:
+
+    $ urifind -S mailto file1
+    mailto:webmaster at boston.com
+
+    $ urifind -S mailto -S http file1
+    mailto:webmaster at boston.com
+    http://www.boston.com/index.html
+
+C<-P> takes an arbitrary Perl regex.  It might need to be protected
+from the shell:
+
+    $ urifind -P 's?html?' file1
+    http://www.boston.com/index.html
+
+    $ urifind -P '\.org\b' -S http file4
+    http://www.gnu.org/software/wget/wget.html
+
+Add a C<-d> to have F<urifind> dump the refexen generated from C<-S>
+and C<-P> to C<STDERR>.  C<-D> does the same but exits immediately:
+
+    $ urifind -P '\.org\b' -S http -D 
+    $scheme = '^(\bhttp\b):'
+    @pats = ('^(\bhttp\b):', '\.org\b')
+
+To remove duplicates from the results, use the C<-u> ("unique")
+switch.
+
+=head1 OPTION SUMMARY
+
+=over 4
+
+=item -s
+
+Sort results.
+
+=item -r
+
+Reverse sort results (implies -s).
+
+=item -u
+
+Return unique results only.
+
+=item -n
+
+Don't include filename in output.
+
+=item -p
+
+Include filename in output (0 by default, but 1 if multiple files are
+included on the command line).
+
+=item -P $re
+
+Print only lines matching regex '$re' (may be specified multiple times).
+
+=item -S $scheme
+
+Only this scheme (may be specified multiple times).
+
+=item -h
+
+Help summary.
+
+=item -v
+
+Display version and exit.
+
+=item -d
+
+Dump compiled regexes for C<-S> and C<-P> to C<STDERR>.
+
+=item -D
+
+Same as C<-d>, but exit after dumping.
+
+=back
+
+=head1 AUTHOR
+
+darren chamberlain E<lt>darren at cpan.orgE<gt>
+
+=head1 COPYRIGHT
+
+(C) 2003 darren chamberlain
+
+This library is free software; you may distribute it and/or modify it
+under the same terms as Perl itself.
+
+=head1 SEE ALSO
+
+L<URI::Find>
diff --git a/debian/changelog b/debian/changelog
index 7775abf..3073e66 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+liburi-find-perl (20100505-1) UNRELEASED; urgency=low
+
+  * New upstream release
+
+ -- Dominic Hargreaves <dom at earth.li>  Sun, 13 Jun 2010 21:09:33 +0100
+
 liburi-find-perl (20090319-2) unstable; urgency=low
 
   * Drop perl-modules from Build-Depends since it does not provide
diff --git a/lib/URI/Find.pm b/lib/URI/Find.pm
index e00a4cb..cdaa752 100644
--- a/lib/URI/Find.pm
+++ b/lib/URI/Find.pm
@@ -10,7 +10,7 @@ use strict;
 use base qw(Exporter);
 use vars qw($VERSION @EXPORT);
 
-$VERSION        = 20090319;
+$VERSION        = 20100505;
 @EXPORT         = qw(find_uris);
 
 use constant YES => (1==1);
@@ -21,7 +21,9 @@ use URI::URL;
 
 require URI;
 
-my($schemeRe) = $URI::scheme_re;
+# URI scheme pattern without the non-alpha numerics.
+# Those are extremely uncommon and interfere with the match.
+my($schemeRe) = qr/[a-zA-Z][a-zA-Z0-9]*/;
 my($uricSet)  = $URI::uric;
 
 # We need to avoid picking up 'HTTP::Request::Common' so we have a
@@ -37,7 +39,6 @@ my($cruftSet) = q{]),.'";}; #'#
 
 URI::Find - Find URIs in arbitrary text
 
-
 =head1 SYNOPSIS
 
   require URI::Find;
@@ -46,7 +47,6 @@ URI::Find - Find URIs in arbitrary text
 
   $how_many_found = $finder->find(\$text);
 
-
 =head1 DESCRIPTION
 
 This module does one thing: Finds URIs and URLs in plain text.  It finds
@@ -55,9 +55,7 @@ to be.)  It only finds URIs which include a scheme (http:// or the
 like), for something a bit less strict have a look at
 L<URI::Find::Schemeless|URI::Find::Schemeless>.
 
-For a command-line interface, see Darren Chamberlain's C<urifind>
-script.  It's available from his CPAN directory,
-L<http://www.cpan.org/authors/id/D/DA/DARREN/>.
+For a command-line interface, L<urifind> is provided.
 
 =head2 Public Methods
 
@@ -143,7 +141,7 @@ sub find {
                 $maybe_uri = $3;
                 my $is_uri = do {  # Don't alter $1...
                     $maybe_uri =~ s/\s+//g;
-                    $maybe_uri =~ $uriRe;
+                    $maybe_uri =~ /^$uriRe/;
                 };
 
                 if( $is_uri ) {
@@ -152,7 +150,23 @@ sub find {
                     $replace .= $escape_func->($4);
                 }
                 else {
-                    $replace .= $escape_func->($2.$3.$4);
+                    # the whole text inside of the <...> was not a url, but
+                    # maybe it has a url (like an HTML <a> link)
+                    my $has_uri = do { # Don't alter $1...
+                        $maybe_uri = $3;
+                        $maybe_uri =~ /$uriRe/;
+                    };
+                    if( $has_uri ) {
+                        my $pre = $2;
+                        my $post = $4;
+                        do { $self->find(\$maybe_uri, $escape_func) };
+                        $replace .= $escape_func->($pre);
+                        $replace .= $maybe_uri;
+                        $replace .= $escape_func->($post);
+                    }
+                    else {
+                        $replace .= $escape_func->($2.$3.$4);
+                    }
                 }
             }
             else {
@@ -304,7 +318,15 @@ sub decruft {
     $self->{end_cruft} = '';
 
     if( $orig_match =~ s/([\Q$cruftSet\E]+)$// ) {
-        $self->{end_cruft} = $1;
+        # urls can end with HTML entities if found in HTML so let's put back semicolons
+        # if this looks like the case
+        my $cruft = $1;
+        if( $cruft =~ /^;/ && $orig_match =~ /\&(\#[1-9]\d{1,3}|[a-zA-Z]{2,8})$/) {
+            $orig_match .= ';';
+            $cruft =~ s/^;//;
+        }
+
+        $self->{end_cruft} = $cruft if $cruft;
     }
 
     return $orig_match;
@@ -490,10 +512,12 @@ Greg Bacon, Jeff Pinyan, Roderick Schertler and others.
 
 Roderick Schertler <roderick at argon.org> maintained versions 0.11 to 0.16.
 
+Darren Chamberlain wrote urifind.
+
 
 =head1 LICENSE
 
-Copyright 2000, 2009 by Michael G Schwern E<lt>schwern at pobox.comE<gt>.
+Copyright 2000, 2009-2010 by Michael G Schwern E<lt>schwern at pobox.comE<gt>.
 
 This program is free software; you can redistribute it and/or 
 modify it under the same terms as Perl itself.
@@ -502,7 +526,8 @@ See F<http://www.perlfoundation.org/artistic_license_1_0>
 
 =head1 SEE ALSO
 
-L<URI::Find::Schemeless>, L<URI::URL>, L<URI>, RFC 3986 Appendix C
+L<urifind>, L<URI::Find::Schemeless>, L<URI::URL>, L<URI>,
+RFC 3986 Appendix C
 
 =cut
 
diff --git a/lib/URI/Find/Schemeless.pm b/lib/URI/Find/Schemeless.pm
index e6042b5..e664318 100644
--- a/lib/URI/Find/Schemeless.pm
+++ b/lib/URI/Find/Schemeless.pm
@@ -12,7 +12,7 @@ use base qw(URI::Find);
 use URI::Find ();
 
 use vars qw($VERSION);
-$VERSION = 20090319;
+$VERSION = 20100505;
 
 my($dnsSet) = 'A-Za-z0-9-';
 
@@ -73,9 +73,7 @@ sub schemeless_uri_re {
            }x;
 }
 
-=over
-
-=item B<top_level_domain_re>
+=head3 top_level_domain_re
 
   my $tld_re = $self->top_level_domain_re;
 
diff --git a/t/Find.t b/t/Find.t
index b47bc80..49105d9 100644
--- a/t/Find.t
+++ b/t/Find.t
@@ -109,6 +109,9 @@ BEGIN {
           '<x>intag.com</x>'     => [[S => 'http://intag.com/'  ]],
           '[mailto:somebody at company.ext]' => 'mailto:somebody at company.ext',
           'HTtp://MIXED-Case.Com' => 'http://mixed-case.com/',
+          "The technology of magnetic energy has become so powerful an entire ".
+          "house can...http://bit.ly/8yEdeb"
+            => "http://bit.ly/8yEdeb",
 
           # False tests
           'HTTP::Request::Common'                       => [],
diff --git a/t/html.t b/t/html.t
new file mode 100644
index 0000000..8330858
--- /dev/null
+++ b/t/html.t
@@ -0,0 +1,38 @@
+#!/usr/bin/perl -w
+use strict;
+use warnings;
+
+my $Example = <<"END";
+
+Yes, Jim, I found it under "http://www.w3.org/Addressing/",
+but you can probably pick it up from <a href="ftp://foo.example.com/rfc/">the RFC</a>. 
+Note the <a class="warning" href="http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING" target="_blank">warning</a>.
+Also <foo bar>.
+<a class="junk" href="http://google.com/search?q=<html>">Search for some entities</a>.
+END
+
+# Which should find these URIs
+my @Uris = (
+      "http://www.w3.org/Addressing/",
+      "ftp://foo.example.com/rfc/",
+      "http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING",
+      "http://google.com/search?q=<html>",
+);
+
+use Test::More tests => 5;
+use URI::Find;
+
+my @found;
+my $finder = URI::Find->new(sub {
+    my($uri) = @_;
+    push @found, $uri;
+    return "Link " . scalar @found;
+    
+});
+$finder->find(\$Example);
+
+is_deeply \@found, \@Uris, "found links in HTML";
+like($Example, qr/"Link 1"/, 'link 1 replaced');
+like($Example, qr/<a href="Link 2"/, 'link 2 replaced');
+like($Example, qr/<a class="warning" href="Link 3"/, 'link 3 replaced');
+like($Example, qr/<a class="junk" href="Link 4"/, 'link 4 replaced');
diff --git a/t/rfc3986_appendix_c.t b/t/rfc3986_appendix_c.t
index da170fc..a4319cc 100644
--- a/t/rfc3986_appendix_c.t
+++ b/t/rfc3986_appendix_c.t
@@ -10,7 +10,7 @@ my $Example = <<"END";
 Yes, Jim, I found it under "http://www.w3.org/Addressing/",
 but you can probably pick it up from <ftp://foo.example.
 com/rfc/>.  Note the warning in <http://www.ics.uci.edu/pub/
-ietf/uri/historical.html#WARNING>.
+ietf/uri/historical.html#WARNING>. Also <foo bar>.
 END
 
 # Which should find these URIs
@@ -20,15 +20,18 @@ my @Uris = (
       "http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING",
 );
 
-
-use Test::More tests => 1;
+use Test::More tests => 4;
 use URI::Find;
 
 my @found;
 my $finder = URI::Find->new(sub {
     my($uri) = @_;
     push @found, $uri;
+    return "Link " . scalar @found;
 });
 $finder->find(\$Example);
 
 is_deeply \@found, \@Uris, "RFC 3986 Appendix C example";
+like($Example, qr/"Link 1"/, 'replaced link 1');
+like($Example, qr/<Link 2>/, 'replaced link 2');
+like($Example, qr/<Link 3>/, 'replaced link 3');
diff --git a/t/urifind/find.t b/t/urifind/find.t
new file mode 100644
index 0000000..2ff271d
--- /dev/null
+++ b/t/urifind/find.t
@@ -0,0 +1,44 @@
+#!/usr/bin/perl -w
+# vim:set ft=perl:
+
+use strict;
+use Test::More;
+use File::Spec;
+
+plan tests => 13;
+
+ok(my $ifile = File::Spec->catfile(qw(t urifind sciencenews)),
+    "Test file found");
+my $urifind = File::Spec->catfile(qw(blib script urifind));
+my @data = `$^X $urifind $ifile`;
+
+is(@data, 13, "Correct number of elements");
+is((grep /mailto:/ => @data), 4, "Found 4 mailto links");
+is((grep /http:/ => @data),   9, "Found 9 mailto links");
+
+ at data = `$^X $urifind $ifile -p`;
+my $count = 0;
+is(@data, 13, "*Still* correct number of elements");
+is((grep /^\Q$ifile/ => @data), @data,
+    "All elements are prefixed with the path when $urifind invoked with -p");
+
+ at data = `$^X $urifind -n $ifile /dev/null`;
+is(@data, 13, "*Still* correct number of elements");
+is((grep !/^\Q$ifile/ => @data), (@data),
+    "All elements are not prefixed with the path when ($urifind,".
+    " '/dev/null') invoked with -n");
+
+ at data = `$^X $urifind -S http $ifile`;
+is(@data, 9, "Correct number of 'http' elements");
+
+ at data = `$^X $urifind -S mailto $ifile`;
+is(@data, 4, "Correct number of 'mailto' elements");
+
+ at data = `$^X $urifind -S mailto -S http $ifile`;
+is(@data, 13, "Correct number of ('http', 'mailto') elements");
+
+ at data = `$^X $urifind < $ifile`;
+is(@data, 13, "Correct number elements when given data on STDIN");
+
+ at data = `$^X $urifind -S http -P \.org $ifile`;
+is(@data, 8, "Correct number elements when invoked with -P \.org -S http");
diff --git a/t/urifind/pod.t b/t/urifind/pod.t
new file mode 100644
index 0000000..51b26e5
--- /dev/null
+++ b/t/urifind/pod.t
@@ -0,0 +1,17 @@
+#!/usr/bin/perl
+# vim: set ft=perl:
+# Stolen from Andy Lester, from 
+# <http://use.perl.org/~petdance/journal/12391>
+
+use Test::More;
+use File::Spec;
+use strict;
+
+eval "use Test::Pod 0.95";
+
+if ($@) {
+    plan skip_all => "Test::Pod v0.95 required for testing POD";
+} else {
+    plan tests => 1;
+    Test::Pod::pod_file_ok(File::Spec->catfile(qw(blib script urifind)));
+}
diff --git a/t/urifind/sciencenews b/t/urifind/sciencenews
new file mode 100644
index 0000000..82508af
--- /dev/null
+++ b/t/urifind/sciencenews
@@ -0,0 +1,95 @@
+From eletter-return-306-dlc+sciencenews=sevenroot.org at lists.sciencenews.org Mon Jul  7 07:33:57 2003
+Return-Path: <eletter-return-306-dlc+sciencenews=sevenroot.org at lists.sciencenews.org>
+Received: from [66.33.90.123] (helo=ns2.dbinteractive.com)
+	by efwd.dnsix.com with smtp (Exim 3.36 #1)
+	id 19YtS5-0007Qd-00
+	for xxxxx at xxxxxxxxx.xxx Sat, 05 Jul 2003 13:16:09 -0700
+Received: (qmail 27761 invoked by uid 502); 5 Jul 2003 16:00:01 -0000
+Mailing-List: contact eletter-help at lists.sciencenews.org; run by ezmlm
+From: e-LETTER at lists.sciencenews.org
+Subject: Science News e-LETTER
+Precedence: bulk
+X-No-Archive: yes
+List-Post: <mailto:eletter at lists.sciencenews.org>
+List-Help: <mailto:eletter-help at lists.sciencenews.org>
+List-Unsubscribe: <mailto:eletter-unsubscribe at lists.sciencenews.org>
+List-Subscribe: <mailto:eletter-subscribe at lists.sciencenews.org>
+Delivered-To: mailing list eletter at lists.sciencenews.org
+Message-Id: <E19YtS5-0007Qd-00 at efwd.dnsix.com>
+Bcc:
+Date: Sat, 05 Jul 2003 13:16:09 -0700
+X-Bogosity: Ham, spamicity=0.000000, algorithm=fisher
+Status: RO
+Content-Length: 3542
+Lines: 69
+
+WEEKLY e-LETTER from SCIENCE NEWS
+July 5, 2003
+
+This week's articles focus on the detection of five-quark particles, the rise of dengue fever in the Americas, the first example of an animal navigating by moonlight polarity, the use of viruses, bacteria, and fungi to engineer new structures, and more. The cover story looks at how human ancestors settled into one ecosystem after another. Food for Thought ponders the anticholesterol benefits of soy greens. MathTrek puzzles over alphamagic squares.
+
+==================================
+Science News is an award-winning weekly newsmagazine covering the most important research in all fields of science. Published since 1922, its 16 pages are packed with short, accurate articles that appeal to both general readers and scientists.
+----------------------------------
+To subscribe to Science News magazine, go to www.sciencenews.org
+==================================
+
+THIS WEEK'S FEATURED ARTICLES:
+
+[Physics]
+Wild Bunch: First five-quark particle turns up
+Physicists have uncovered strong evidence for a family of five-quark particles after decades of finding no subatomic particles with more than three of the fundamental building blocks known as quarks.
+http://www.sciencenews.org/20030705/fob1.asp
+
+[Behavior]
+Till IL-6 Do Us Part: Elderly caregivers show harmful immune effect
+Elderly people caring for their incapacitated spouses experienced dramatic average increases in the blood concentration of a protein involved in immune regulation, a trend that puts them at risk for a variety of serious illnesses.
+http://www.sciencenews.org/20030705/fob5.asp
+
+[Materials Science]
+Microbial Materials: Scientists co-opt viruses, bacteria, and fungi to build new structures
+Microorganisms can be coaxed into producing high-tech components and can themselves serve as valuable ingredients in new classes of materials.
+
+http://www.sciencenews.org/20030705/bob8.asp
+
+THIS WEEK'S ONLINE FEATURES:
+
+[MATHTREK]
+Alphamagic Squares
+http://www.sciencenews.org/20030705/mathtrek.asp
+
+[FOOD FOR THOUGHT]
+Soy Greens--The Coming Health Food?
+http://www.sciencenews.org/20030705/food.asp
+
+----------------------------------
+To subscribe to Science News magazine, go to www.sciencenews.org
+----------------------------------
+
+Week of July 5, 2003; Vol. 164 No. 1
+
+THIS WEEK'S TABLE OF CONTENTS: http://www.sciencenews.org/20030705/toc.asp
+
+References and sources for all articles are available online at www.sciencenews.org
+
+***********************************
+REGISTERED SUBSCRIBERS to the print edition of Science News also have online access to the full text of the following articles:
+
+[Biology]
+A Matter of Taste: Mutated fruit flies bypass the salt
+By creating mutant fruit flies with an impaired capacity to taste salt, researchers have identified several genes that contribute to this sensory system in insects.
+http://www.sciencenews.org/20030705/fob2.asp
+
+[Biomedicine]
+Lethal Emergence: Tracing the rise of dengue fever in the Americas
+Using the genetics of viruses, scientists have tracked a virulent form of dengue virus in Latin America back to its roots in India.
+http://www.sciencenews.org/20030705/fob3.asp
+
+[Zoology]
+Moonlighting: Beetles navigate by lunar polarity
+A south African dung beetle is the first animal found to align its path by detecting the polarization of moonlight.
+http:/%
+---------------------------------------------------------------------
+To unsubscribe, e-mail: eletter-unsubscribe at lists.sciencenews.org
+For additional commands, e-mail: eletter-help at lists.sciencenews.org
+

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/pkg-perl/packages/liburi-find-perl.git



More information about the Pkg-perl-cvs-commits mailing list