[liburi-find-perl] 04/31: new upstream release
gregor herrmann
gregoa at debian.org
Sat Jul 25 17:41:46 UTC 2015
This is an automated email from the git hooks/post-receive script.
gregoa pushed a commit to branch master
in repository liburi-find-perl.
commit d051fb19eec6e898202f3ae7d8610360bf4a1e5c
Author: Dominic Hargreaves <dom at earth.li>
Date: Sun Jun 21 17:05:03 2009 +0000
new upstream release
---
Build.PL | 38 +++++++
Changes | 58 ++++++++--
INSTALL | 24 +++-
MANIFEST | 18 +--
MANIFEST.SKIP | 37 ++++++-
META.yml | 42 +++++--
Makefile.PL | 47 --------
README | 87 ++++++++++++---
SIGNATURE | 38 +++++++
TODO | 6 +-
debian/changelog | 6 +
debian/rules | 10 +-
lib/URI/Find.pm | 267 ++++++++++++++++++++++++++++-----------------
lib/URI/Find/Schemeless.pm | 58 +++++-----
t/Find.t | 197 +++++++++++++--------------------
t/filter.t | 39 +++++++
t/is_schemed.t | 17 +++
t/load-schemeless.t | 17 +--
t/rfc3986_appendix_c.t | 34 ++++++
19 files changed, 676 insertions(+), 364 deletions(-)
diff --git a/Build.PL b/Build.PL
new file mode 100644
index 0000000..9136b16
--- /dev/null
+++ b/Build.PL
@@ -0,0 +1,38 @@
+#!/usr/bin/perl -w
+
+use Module::Build 0.30;
+
+require 5.006;
+
+my $build = Module::Build->new(
+ module_name => 'URI::Find',
+
+ configure_requires => {
+ Module::Build => '0.30'
+ },
+
+ build_requires => {
+ Test::More => '0.82',
+ Module::Build => '0.30',
+ },
+
+ requires => {
+ perl => '5.6.0',
+ URI => '1.00',
+ URI::URL => '5.00',
+ },
+
+ license => 'perl',
+
+ dist_author => 'Michael G Schwern <schwern at pobox.com>',
+
+ meta_merge => {
+ resources => {
+ homepage => 'http://search.cpan.org/dist/URI-Find',
+ bugtracker => 'http://rt.cpan.org/Public/Dist/Display.html?Name=URI-Find',
+ repository => 'http://github.com/schwern/uri-find/tree/master',
+ }
+ }
+);
+
+$build->create_build_script;
diff --git a/Changes b/Changes
index fa2755a..78847da 100644
--- a/Changes
+++ b/Changes
@@ -1,29 +1,66 @@
Revision history for Perl module URI::Find.
+20090319 Thu Mar 19 12:17:53 PDT 2009
+ Bug Fixes
+ * Schemeless now ignores the case of the TLD.
+
+ New Features
+ * Updated the list of accepted domains for finding schemeless URIs
+ from the latest ICANN list.
+
+ Docs
+ * Add LICENSE section
+ * Remove wildly out of date CAVEATS
+ * Added an example of how to get a list of all URIs.
+ * Updated INSTALL section to reflect new dependencies and Module::Build
+ installation process
+ * Regenerated the README file
+
+
+20090316 Mon Mar 16 16:18:10 PDT 2009
+ New Features
+ * Added optional replacement function to find(). Now you
+ can not only replace URLs found, but also the rest of the text around
+ them in one fell swoop. (Mike Schilli) [rt.cpan.org 20486]
+ * Whitespace inside <...> is now ignored as per the suggestion of
+ RFC 3986 appendix C. [rt.cpan.org 20483]
+
+ Other
+ * Michael G Schwern is now primary maintainer again. Thanks for all your
+ work, Roderick!
+ * Repository moved to http://github.com/schwern/uri-find
+ * Now requires Test::More
+ * Verisoning scheme changed to ISO date integers
+ * Minimum Perl version is now 5.6.0.
+
+
0.16 Fri Jul 22 06:00:24 EDT 2005
- Oops, make the URI::Find::Schemeless->top_level_domain_re case
insensitive, as it should be and the docs claimed it was. Thanks
to Todd Eigenschink.
+
0.15 Tue Mar 22 07:23:17 EST 2005
- Have all functions croak if invoked with the wrong number of
arguments. Add URI::Find->badinvo.
- https://rt.cpan.org/NoAuth/Bug.html?id=1845
+ https://rt.cpan.org/NoAuth/Bug.html?id=1845
- Mention DARREN's urifind script in the man page.
- Oops, URI::URL::strict was turned on and left on. Put it back the
way you found it. Thanks to Chris Nandor.
- https://rt.cpan.org/NoAuth/Bug.html?id=11906
+ https://rt.cpan.org/NoAuth/Bug.html?id=11906
- Schemeless.pm:
- - Find '<x>intag.com</x>'.
- - Get $tldRe from a new class method, ->top_level_domain_re.
- - Update top level domain list.
+ - Find '<x>intag.com</x>'.
+ - Get $tldRe from a new class method, ->top_level_domain_re.
+ - Update top level domain list.
+
-0.13 Sat Oct 9 08:20:04 EDT 2004
+0.14 Sat Oct 9 08:20:04 EDT 2004
- Add copyright notice.
- Add ] to main $cruftSet, } to schemeless $cruftSet, for
[http://square.com] and {brace.com}.
- quotemeta() $cruftSet.
+
0.13 Mon Jul 1 10:37:54 EDT 2002
- Don't find any schemeless URIs with a plain URI::Find. Previously
it'd find ones which started with "ftp." and "www.", but it was
@@ -31,22 +68,27 @@ Revision history for Perl module URI::Find.
- Have schemeless_to_schemed use http:// except in the specific case
in which it uses ftp://. Remove URI::Find::Schemeless's version.
+
0.12 Wed Mar 20 14:39:21 EST 2002
- Improve the "wrap each URI found in an HTML anchor" example.
- Release a new version so CPAN sees the maintainer change.
+
0.11 Thu Jul 26 14:43:49 EDT 2001
- Michael passed the module to Roderick for maintenance.
- Improve test suite.
- Tweak URI::Find::Schemeless not to find Foo.p[ml].
+
0.10 Mon Jul 10 20:14:08 EDT 2000
- Rearchitected the internals to allow simple subclassing
- Added URI::Find::Schemeless (thanks Roderick)
+
0.04 Sat Feb 26 09:05:11 GMT 2000
- Added # to the uric set of characters so HTML anchors are caught.
+
0.03 Tue Feb 1 16:15:22 EST 2000
- Added some heuristic discussion to the docs.
- Added some heuristics to avoid picking up perl module names
@@ -55,11 +97,11 @@ Revision history for Perl module URI::Find.
- Handling the <URL:...> case better as suggested in RFC 2396 Apdx E
- Added ; to the cruft heuristic
+
0.02 Tue Feb 1 13:11:56 EST 2000
- Added heuristic to handle 'URL:http://www.foo.com'
- Added heuristic to handle trailing quotes.
+
0.01 Mon Jan 31 19:12:23 EST 2000
- First working version released to CPAN.
-
-$Id: Changes,v 1.11 2005/07/22 10:02:35 roderick Exp $
diff --git a/INSTALL b/INSTALL
index b5a7e5c..f6e544d 100644
--- a/INSTALL
+++ b/INSTALL
@@ -8,21 +8,33 @@ HOW DO I INSTALL IT?
To install this module, cd to the directory that contains this README
file and type the following:
- perl Makefile.PL
- make
- make test
- make install
+ perl Build.PL
+ ./Build
+ ./Build test
+ ./Build install
To install this module into a specific directory, do:
- perl Makefile.PL PREFIX=/name/of/the/directory
+ perl Build.PL --install_base /name/of/the/directory
...the rest is the same...
Please also read the perlmodinstall man page, if available.
+WHAT VERSION OF PERL DO I NEED?
+
+ perl 5.6.0 or higher
+
+
WHAT MODULES DO I NEED?
+To build, test and install the module you need:
+
+ Module::Build 0.30 or higher
+ Test::More 0.82 or higher
+
+To run the module you need:
+
URI.pm 1.00 or higher
URI::URL 5.00 or higher
-$Id: INSTALL,v 1.3 2001/07/27 12:13:40 roderick Exp $
+They can all be found on http://search.cpan.org/ or by running your CPAN shell.
\ No newline at end of file
diff --git a/MANIFEST b/MANIFEST
index d25c303..d0a4422 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -1,14 +1,16 @@
- $Id: MANIFEST,v 1.7 2004/10/10 21:31:50 roderick Exp $
-
+Build.PL
Changes
INSTALL
-MANIFEST
-MANIFEST.SKIP
-META.yml Module meta-data (added by MakeMaker)
-Makefile.PL
-README
-TODO
lib/URI/Find.pm
lib/URI/Find/Schemeless.pm
+MANIFEST This list of files
+MANIFEST.SKIP
+META.yml
+README
+t/filter.t
t/Find.t
+t/is_schemed.t
t/load-schemeless.t
+t/rfc3986_appendix_c.t
+TODO
+SIGNATURE Added here by Module::Build
diff --git a/MANIFEST.SKIP b/MANIFEST.SKIP
index 99b7b83..9045d98 100644
--- a/MANIFEST.SKIP
+++ b/MANIFEST.SKIP
@@ -1,5 +1,36 @@
-# $Id: MANIFEST.SKIP,v 1.1 2001/07/27 12:05:20 roderick Exp $
-(^|/)CVS/
+#!start included /usr/local/perl/5.10.0/lib/5.10.0/ExtUtils/MANIFEST.SKIP
+# Avoid version control files.
+\bRCS\b
+\bCVS\b
+\bSCCS\b
+,v$
+\B\.svn\b
+\B\.git\b
+\B\.gitignore\b
+\b_darcs\b
+
+# Avoid Makemaker generated and utility files.
+\bMANIFEST\.bak
+\bMakefile$
+\bblib/
+\bMakeMaker-\d
+\bpm_to_blib\.ts$
+\bpm_to_blib$
+\bblibdirs\.ts$ # 6.18 through 6.25 generated this
+
+# Avoid Module::Build generated and utility files.
+\bBuild$
+\b_build/
+
+# Avoid temp and backup files.
~$
-^Makefile$
+\.old$
+\#$
+\b\.#
+\.bak$
+
+# Avoid Devel::Cover files.
+\bcover_db\b
+#!end included /usr/local/perl/5.10.0/lib/5.10.0/ExtUtils/MANIFEST.SKIP
+
diff --git a/META.yml b/META.yml
index 75a188d..643a5d0 100644
--- a/META.yml
+++ b/META.yml
@@ -1,12 +1,32 @@
-# http://module-build.sourceforge.net/META-spec.html
-#XXXXXXX This is a prototype!!! It will change in the future!!! XXXXX#
-name: URI-Find
-version: 0.16
-version_from: lib/URI/Find.pm
-installdirs: site
+---
+name: URI-Find
+version: 20090319
+author:
+ - 'Michael G Schwern <schwern at pobox.com>'
+abstract: Find URIs in arbitrary text
+license: perl
+resources:
+ bugtracker: http://rt.cpan.org/Public/Dist/Display.html?Name=URI-Find
+ homepage: http://search.cpan.org/dist/URI-Find
+ license: http://dev.perl.org/licenses/
+ repository: http://github.com/schwern/uri-find/tree/master
+configure_requires:
+ Module::Build: 0.30
requires:
- URI: 1.00
- URI::URL: 5.00
-
-distribution_type: module
-generated_by: ExtUtils::MakeMaker version 6.17
+ URI: 1.00
+ URI::URL: 5.00
+ perl: 5.6.0
+build_requires:
+ Module::Build: 0.30
+ Test::More: 0.82
+provides:
+ URI::Find:
+ file: lib/URI/Find.pm
+ version: 20090319
+ URI::Find::Schemeless:
+ file: lib/URI/Find/Schemeless.pm
+ version: 20090319
+generated_by: Module::Build version 0.3103
+meta-spec:
+ url: http://module-build.sourceforge.net/META-spec-v1.2.html
+ version: 1.2
diff --git a/Makefile.PL b/Makefile.PL
deleted file mode 100644
index b57055f..0000000
--- a/Makefile.PL
+++ /dev/null
@@ -1,47 +0,0 @@
-# $Id: Makefile.PL,v 1.6 2004/10/11 14:22:50 roderick Exp $
-
-# A template for Makefile.PL used by Arena Networks.
-# - Set the $PACKAGE variable to the name of your module.
-# - Set $LAST_API_CHANGE to reflect the last version you changed the API
-# of your module.
-# - Fill in your dependencies in PREREQ_PM
-# Alternatively, you can say the hell with this and use h2xs.
-
-use ExtUtils::MakeMaker;
-# See lib/ExtUtils/MakeMaker.pm for details of how to influence
-# the contents of the Makefile that is written.
-
-require 5.005;
-
-$PACKAGE = 'URI::Find';
-($PACKAGE_FILE) = $PACKAGE =~ /(?:\::)?([^:]+)$/;
-$LAST_API_CHANGE = 0;
-
-eval {
- eval "require $PACKAGE";
-};
-
-unless ($@) { # Make sure we did find the module.
- print <<"CHANGE_WARN" if ${$PACKAGE.'::VERSION'} < $LAST_API_CHANGE;
-
-NOTE: There have been API changes between this version and any older
-than version $LAST_API_CHANGE! Please read the Changes file if you
-are upgrading from a version older than $LAST_API_CHANGE.
-
-CHANGE_WARN
-}
-
-WriteMakefile(
- NAME => $PACKAGE,
- VERSION_FROM => "lib/URI/$PACKAGE_FILE.pm", # finds $VERSION
- PREREQ_PM => { URI => '1.00',
- URI::URL => '5.00',
- },
- 'dist' => { COMPRESS => 'gzip -9',
- SUFFIX => '.gz',
- DIST_DEFAULT => 'all tardist',
- PREOP => '$(MAKE) ci',
- CI => 'cvs commit',
- RCS_LABEL => 'cvs tag v$(VERSION_SYM)',
- },
-);
diff --git a/README b/README
index fd47f0b..716a016 100644
--- a/README
+++ b/README
@@ -1,31 +1,86 @@
NAME
- URI::Find - Find URIs in arbitrary text
+ URI::Find - Find URIs in arbitrary text
SYNOPSIS
- require URI::Find;
+ require URI::Find;
- my $finder = URI::Find->new(\&callback);
+ my $finder = URI::Find->new(\&callback);
- $how_many_found = $finder->find(\$text);
+ $how_many_found = $finder->find(\$text);
DESCRIPTION
- This module does one thing: Finds URIs and URLs in plain
- text. It finds them quickly and it finds them all (or
- what URI::URL considers a URI to be.)
+ This module does one thing: Finds URIs and URLs in plain text. It finds
+ them quickly and it finds them all (or what URI::URL considers a URI to
+ be.) It only finds URIs which include a scheme (http:// or the like),
+ for something a bit less strict have a look at URI::Find::Schemeless.
-AUTHOR
- Michael G Schwern <schwern at pobox.com> with insight from
- Uri Gutman and Jeff Pinyan.
+ For a command-line interface, see Darren Chamberlain's "urifind" script.
+ It's available from his CPAN directory,
+ <http://www.cpan.org/authors/id/D/DA/DARREN/>.
+
+EXAMPLES
+ Store a list of all URIs (normalized) in the document.
+
+ my @uris;
+ my $finder = URI::Find->new(sub {
+ my($uri) = shift;
+ push @uris, $uri;
+ });
+ $finder->find(\$text);
+
+ Print the original URI text found and the normalized representation.
+
+ my $finder = URI::Find->new(sub {
+ my($uri, $orig_uri) = @_;
+ print "The text '$orig_uri' represents '$uri'\n";
+ return $orig_uri;
+ });
+ $finder->find(\$text);
+
+ Check each URI in document to see if it exists.
- Currently maintained by Roderick Schertler <roderick at argon.org>.
+ use LWP::Simple;
+
+ my $finder = URI::Find->new(sub {
+ my($uri, $orig_uri) = @_;
+ if( head $uri ) {
+ print "$orig_uri is okay\n";
+ }
+ else {
+ print "$orig_uri cannot be found\n";
+ }
+ return $orig_uri;
+ });
+ $finder->find(\$text);
+
+ Turn plain text into HTML, with each URI found wrapped in an HTML
+ anchor.
+
+ use CGI qw(escapeHTML);
+ use URI::Find;
+
+ my $finder = URI::Find->new(sub {
+ my($uri, $orig_uri) = @_;
+ return qq|<a href="$uri">$orig_uri</a>|;
+ });
+ $finder->find(\$text, \&escapeHTML);
+ print "<pre>$text</pre>";
+
+AUTHOR
+ Michael G Schwern <schwern at pobox.com> with insight from Uri Gutman, Greg
+ Bacon, Jeff Pinyan, Roderick Schertler and others.
+ Roderick Schertler <roderick at argon.org> maintained versions 0.11 to
+ 0.16.
-See the INSTALL document for more details.
+LICENSE
+ Copyright 2000, 2009 by Michael G Schwern <schwern at pobox.com>.
+ This program is free software; you can redistribute it and/or modify it
+ under the same terms as Perl itself.
-Copyright (c) 2000 Michael G. Schwern. All rights reserved. This
-program is free software; you can redistribute it and/or modify it
-under the same terms as Perl itself.
+ See http://www.perlfoundation.org/artistic_license_1_0
+SEE ALSO
+ URI::Find::Schemeless, URI::URL, URI, RFC 3986 Appendix C
-$Id: README,v 1.5 2004/10/09 12:20:04 roderick Exp $
diff --git a/SIGNATURE b/SIGNATURE
new file mode 100644
index 0000000..574120f
--- /dev/null
+++ b/SIGNATURE
@@ -0,0 +1,38 @@
+This file contains message digests of all files listed in MANIFEST,
+signed via the Module::Signature module, version 0.55.
+
+To verify the content in this distribution, first make sure you have
+Module::Signature installed, then type:
+
+ % cpansign -v
+
+It will check each file's integrity, as well as the signature's
+validity. If "==> Signature verified OK! <==" is not displayed,
+the distribution may already have been compromised, and you should
+not run its Makefile.PL or Build.PL.
+
+-----BEGIN PGP SIGNED MESSAGE-----
+Hash: SHA1
+
+SHA1 1a0c5437cbca894673a84ae33b0455d40cf32a24 Build.PL
+SHA1 34762762a1afa939862f8f35787cf81e01b9a66a Changes
+SHA1 3ca0307f0585442c160041b8d8f3472359735108 INSTALL
+SHA1 496f20eefdeee2b954fca6e57442ce984f1999a4 MANIFEST
+SHA1 c4c081b812afdd274d037fe04ba8776c0b831caa MANIFEST.SKIP
+SHA1 286609951d54a65f2f1dbf0b15a2114b946482d7 META.yml
+SHA1 fc72946137d28d945b8c7027b4a49a001ec49cd1 README
+SHA1 7dc0589de524cbd4c983c5cd6e9da58fd474b34a TODO
+SHA1 6d32866bbd081815ec428dcfa01913885da57912 lib/URI/Find.pm
+SHA1 a26bfc871298a5936275722c66dd6611822aebd2 lib/URI/Find/Schemeless.pm
+SHA1 b3b2e47b718a06f48d668ad929818669696c649b t/Find.t
+SHA1 20f14cb0c5b625cc2040183426c97f0cfad7e148 t/filter.t
+SHA1 2c057ac42eb47f6b7da78c3b7ebb20b94f33e719 t/is_schemed.t
+SHA1 a0fdf62d822e769d80b229bb88f1a013f6ab0964 t/load-schemeless.t
+SHA1 2ce9adca5502a9a3aa330594d627e99c1d93623f t/rfc3986_appendix_c.t
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1.4.9 (Darwin)
+
+iEYEARECAAYFAknCm/EACgkQWMohlhD1QycoagCbB8DV2gDdWW1wMPTgVbGKgiMd
+fmMAn2LtDyop0OwyPzLpNM3VKv+unpoY
+=1MtC
+-----END PGP SIGNATURE-----
diff --git a/TODO b/TODO
index 1c151c1..0f9dea1 100644
--- a/TODO
+++ b/TODO
@@ -1,8 +1,10 @@
-$Id: TODO,v 1.2 2005/03/22 16:04:54 roderick Exp $
-
- parameterize top level domain list in Schemeless.pm?
- shouldn't have picked this out:
$url = 'http://'.rand(1000000).'@anonymizer.com/'.$url (/url 63);
- find email addresses
- <freeside> $text =~ s((?<![\/:])([.-_\w]+\@(\w+\.)+\w+))(<A HREF="mailto:$1">$1</A>)g;
- see also Email::Find
+- I'd think this should either be leaving off the parenthesized part or
+ including the close paren:
+ http://www.tbjck.com(86-10-85893372)
+ -> http://www.tbjck.com(86-10-85893372/
diff --git a/debian/changelog b/debian/changelog
index 1dd9208..3d1a685 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+liburi-find-perl (20090319-1) UNRELEASED; urgency=low
+
+ * New upstream release
+
+ -- Dominic Hargreaves <dom at earth.li> Sun, 21 Jun 2009 18:04:50 +0100
+
liburi-find-perl (0.16-2) unstable; urgency=low
* Fix debian/rules rmdir bug (closes: #467926)
diff --git a/debian/rules b/debian/rules
index 33a3655..7f55fef 100755
--- a/debian/rules
+++ b/debian/rules
@@ -33,8 +33,8 @@ build-stamp:
dh_testdir
# Add commands to compile the package here
- $(PERL) Makefile.PL INSTALLDIRS=vendor
- $(MAKE) OPTIMIZE="$(CFLAGS)"
+ $(PERL) Build.PL --force installdirs=vendor
+ OPTIMIZE="$(OPTIMIZE)" LD_RUN_PATH="" ./Build
touch build-stamp
@@ -43,7 +43,7 @@ clean:
dh_testroot
# Add commands to clean up after the build process here
- [ ! -f Makefile ] || $(MAKE) distclean
+ ./Build realclean || true
dh_clean build-stamp install-stamp
@@ -53,8 +53,8 @@ install-stamp: build-stamp
dh_testroot
dh_clean -k
- $(MAKE) test
- $(MAKE) install DESTDIR=$(TMP) PREFIX=/usr
+ ./Build test
+ ./Build install destdir=$(TMP)
# As this is a architecture independent package, we are not supposed to install
# stuff to /usr/lib. MakeMaker creates the dirs, we delete them from the deb:
diff --git a/lib/URI/Find.pm b/lib/URI/Find.pm
index 40f3f61..e00a4cb 100644
--- a/lib/URI/Find.pm
+++ b/lib/URI/Find.pm
@@ -1,24 +1,22 @@
-# $Id: Find.pm,v 1.16 2005/07/22 10:02:37 roderick Exp $
-#
-# Copyright (c) 2000 Michael G. Schwern. All rights reserved. This
-# program is free software; you can redistribute it and/or modify it
-# under the same terms as Perl itself.
+# Copyright (c) 2000, 2009 Michael G. Schwern. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the same terms as Perl itself.
package URI::Find;
-require 5.005;
+require 5.006;
use strict;
use base qw(Exporter);
use vars qw($VERSION @EXPORT);
-$VERSION = '0.16';
- at EXPORT = qw(find_uris);
+$VERSION = 20090319;
+ at EXPORT = qw(find_uris);
use constant YES => (1==1);
use constant NO => !YES;
-use Carp qw(croak);
+use Carp qw(croak);
use URI::URL;
require URI;
@@ -37,7 +35,7 @@ my($cruftSet) = q{]),.'";}; #'#
=head1 NAME
- URI::Find - Find URIs in arbitrary text
+URI::Find - Find URIs in arbitrary text
=head1 SYNOPSIS
@@ -96,13 +94,28 @@ sub new {
$text is a string to search and possibly modify with your callback.
+Alternatively, C<find> can be called with a replacement function for
+the rest of the text:
+
+ use CGI qw(escapeHTML);
+ # ...
+ my $how_many_found = $finder->find(\$text, \&escapeHTML);
+
+will not only call the callback function for every URL found (and
+perform the replacement instructions therein), but also run the rest
+of the text through C<escapeHTML()>. This makes it easier to turn
+plain text which contains URLs into HTML (see example below).
+
=cut
sub find {
- @_ == 2 || __PACKAGE__->badinvo;
- my($self, $r_text) = @_;
+ @_ == 2 || @_ == 3 || __PACKAGE__->badinvo;
+ my($self, $r_text, $escape_func) = @_;
- my $urlsfound = 0;
+ # Might be slower, but it makes the code simpler
+ $escape_func ||= sub { return $_[0] };
+
+ $self->{_uris_found} = 0;
# Don't assume http.
my $old_strict = URI::URL::strict(1);
@@ -116,33 +129,72 @@ sub find {
my $uriRe = sprintf '(?:%s|%s)', $self->uri_re, $self->schemeless_uri_re;
- $$r_text =~ s{(<$uriRe>|$uriRe)}{
- my($orig_match) = $1;
-
- # A heruristic. Often you'll see things like:
- # "I saw this site, http://www.foo.com, and its really neat!"
- # or "Foo Industries (at http://www.foo.com)"
- # We want to avoid picking up the trailing paren, period or comma.
- # Of course, this might wreck a perfectly valid URI, more often than
- # not it corrects a parse mistake.
- $orig_match = $self->decruft($orig_match);
-
- if( my $uri = $self->_is_uri(\$orig_match) ) { # Its a URI.
- $urlsfound++;
-
- # Don't forget to put any cruft we accidentally matched back.
- $self->recruft($self->{callback}->($uri, $orig_match));
+ $$r_text =~ s{ (.*?) (?:(<(?:URL:)?)(.+?)(>)|($uriRe)) | (.+?)$ }{
+ my $replace = '';
+ if( defined $6 ) {
+ $replace = $escape_func->($6);
}
- else { # False alarm.
- # Again, don't forget the cruft.
- $self->recruft($orig_match);
+ else {
+ my $maybe_uri = '';
+
+ $replace = $escape_func->($1);
+
+ if( defined $2 ) {
+ $maybe_uri = $3;
+ my $is_uri = do { # Don't alter $1...
+ $maybe_uri =~ s/\s+//g;
+ $maybe_uri =~ $uriRe;
+ };
+
+ if( $is_uri ) {
+ $replace .= $escape_func->($2);
+ $replace .= $self->_uri_filter($maybe_uri);
+ $replace .= $escape_func->($4);
+ }
+ else {
+ $replace .= $escape_func->($2.$3.$4);
+ }
+ }
+ else {
+ $replace .= $self->_uri_filter($5);
+ }
}
- }eg;
+
+ $replace;
+ }gsex;
URI::URL::strict($old_strict);
- return $urlsfound;
+ return $self->{_uris_found};
}
+
+sub _uri_filter {
+ my($self, $orig_match) = @_;
+
+ # A heuristic. Often you'll see things like:
+ # "I saw this site, http://www.foo.com, and its really neat!"
+ # or "Foo Industries (at http://www.foo.com)"
+ # We want to avoid picking up the trailing paren, period or comma.
+ # Of course, this might wreck a perfectly valid URI, more often than
+ # not it corrects a parse mistake.
+ $orig_match = $self->decruft($orig_match);
+
+ my $replacement = '';
+ if( my $uri = $self->_is_uri(\$orig_match) ) {
+ # It's a URI
+ $self->{_uris_found}++;
+ $replacement = $self->{callback}->($uri, $orig_match);
+ }
+ else {
+ # False alarm
+ $replacement = $orig_match;
+ }
+
+ # Return recrufted replacement
+ return $self->recruft($replacement);
+}
+
+
=back
=head2 Protected Methods
@@ -292,7 +344,7 @@ sub schemeless_to_schemed {
my($self, $uri_cand) = @_;
$uri_cand =~ s|^(<?)ftp\.|$1ftp://ftp\.|
- or $uri_cand =~ s|^(<?)|${1}http://|;
+ or $uri_cand =~ s|^(<?)|${1}http://|;
return $uri_cand;
}
@@ -322,32 +374,50 @@ The args are optional.
=cut
sub badinvo {
- my $package = shift;
- my $level = @_ ? shift : 0;
- my $msg = @_ ? " (" . shift() . ")" : '';
- my $subname = (caller $level + 1)[3];
+ my $package = shift;
+ my $level = @_ ? shift : 0;
+ my $msg = @_ ? " (" . shift() . ")" : '';
+ my $subname = (caller $level + 1)[3];
croak "Bogus invocation of $subname$msg";
}
+=back
+
=head2 Old Functions
The old find_uri() function is still around and it works, but its
deprecated.
+=cut
+
+# Old interface.
+sub find_uris (\$&) {
+ @_ == 2 || __PACKAGE__->badinvo;
+ my($r_text, $callback) = @_;
+
+ my $self = __PACKAGE__->new($callback);
+ return $self->find($r_text);
+}
-=back
=head1 EXAMPLES
-Simply print the original URI text found and the normalized
-representation.
+Store a list of all URIs (normalized) in the document.
- my $finder = URI::Find->new(
- sub {
- my($uri, $orig_uri) = @_;
- print "The text '$orig_uri' represents '$uri'\n";
- return $orig_uri;
- });
+ my @uris;
+ my $finder = URI::Find->new(sub {
+ my($uri) = shift;
+ push @uris, $uri;
+ });
+ $finder->find(\$text);
+
+Print the original URI text found and the normalized representation.
+
+ my $finder = URI::Find->new(sub {
+ my($uri, $orig_uri) = @_;
+ print "The text '$orig_uri' represents '$uri'\n";
+ return $orig_uri;
+ });
$finder->find(\$text);
Check each URI in document to see if it exists.
@@ -355,56 +425,29 @@ Check each URI in document to see if it exists.
use LWP::Simple;
my $finder = URI::Find->new(sub {
- my($uri, $orig_uri) = @_;
- if( head $uri ) {
- print "$orig_uri is okay\n";
- }
- else {
- print "$orig_uri cannot be found\n";
- }
- return $orig_uri;
- });
+ my($uri, $orig_uri) = @_;
+ if( head $uri ) {
+ print "$orig_uri is okay\n";
+ }
+ else {
+ print "$orig_uri cannot be found\n";
+ }
+ return $orig_uri;
+ });
$finder->find(\$text);
Turn plain text into HTML, with each URI found wrapped in an HTML anchor.
use CGI qw(escapeHTML);
+ use URI::Find;
- $text = "<pre>\n" . escapeHTML($text) . "</pre>\n";
- my $finder = URI::Find->new(
- sub {
- my($uri, $orig_uri) = @_;
- return qq|<a href="$uri">$orig_uri</a>|;
- });
- $finder->find(\$text);
-
-
-=head1 CAVEATS, BUGS, ETC...
-
-RFC 2396 Appendix E suggests using the form '<http://www.foo.com>' or
-'<URL:http://www.foo.com>' when putting URLs in plain text. URI::Find
-accomidates this suggestion and considers the entire thing (brackets
-and all) to be part of the URL found. This means that when
-find_uris() sees '<URL:http://www.foo.com>' it will hand that entire
-string to your callback, not just the URL.
-
-NOTE: The prototype on find_uris() is already getting annoying to me.
-I might remove it in a future version.
-
-
-=head1 SEE ALSO
-
- L<URI::Find::Schemeless>, L<URI::URL>, L<URI>,
- RFC 2396 (especially Appendix E)
-
-
-=head1 AUTHOR
-
-Michael G Schwern <schwern at pobox.com> with insight from Uri Gutman,
-Greg Bacon, Jeff Pinyan, Roderick Schertler and others.
-
-Currently maintained by Roderick Schertler <roderick at argon.org>.
+ my $finder = URI::Find->new(sub {
+ my($uri, $orig_uri) = @_;
+ return qq|<a href="$uri">$orig_uri</a>|;
+ });
+ $finder->find(\$text, \&escapeHTML);
+ print "<pre>$text</pre>";
=cut
@@ -416,31 +459,51 @@ sub _is_uri {
my $uri = $$r_uri_cand;
# Translate schemeless to schemed if necessary.
- $uri = $self->schemeless_to_schemed($uri) unless
- $uri =~ /^<?$schemeRe:/;
+ $uri = $self->schemeless_to_schemed($uri) if
+ $uri =~ $self->schemeless_uri_re and
+ $uri !~ /^<?$schemeRe:/;
eval {
$uri = URI::URL->new($uri);
};
- if($@ || !defined $uri) { # leave everything untouched, its not a URI.
+ if($@ || !defined $uri) { # leave everything untouched, its not a URI.
return NO;
}
- else { # Its a URI.
+ else { # Its a URI.
return $uri;
}
}
-# Old interface.
-sub find_uris (\$&) {
- @_ == 2 || __PACKAGE__->badinvo;
- my($r_text, $callback) = @_;
+=head1 NOTES
- my $self = __PACKAGE__->new($callback);
- return $self->find($r_text);
-}
+Will not find URLs with Internationalized Domain Names or pretty much
+any non-ascii stuff in them. See
+L<http://rt.cpan.org/Ticket/Display.html?id=44226>
+
+
+=head1 AUTHOR
+
+Michael G Schwern <schwern at pobox.com> with insight from Uri Gutman,
+Greg Bacon, Jeff Pinyan, Roderick Schertler and others.
+Roderick Schertler <roderick at argon.org> maintained versions 0.11 to 0.16.
+=head1 LICENSE
+
+Copyright 2000, 2009 by Michael G Schwern E<lt>schwern at pobox.comE<gt>.
+
+This program is free software; you can redistribute it and/or
+modify it under the same terms as Perl itself.
+
+See F<http://www.perlfoundation.org/artistic_license_1_0>
+
+=head1 SEE ALSO
+
+L<URI::Find::Schemeless>, L<URI::URL>, L<URI>, RFC 3986 Appendix C
+
+=cut
+
1;
diff --git a/lib/URI/Find/Schemeless.pm b/lib/URI/Find/Schemeless.pm
index 65c0cb7..bf2dd9b 100644
--- a/lib/URI/Find/Schemeless.pm
+++ b/lib/URI/Find/Schemeless.pm
@@ -1,8 +1,6 @@
-# $Id: Schemeless.pm,v 1.8 2005/03/22 16:03:11 roderick Exp $
-#
-# Copyright (c) 2000 Michael G. Schwern. All rights reserved. This
-# program is free software; you can redistribute it and/or modify it
-# under the same terms as Perl itself.
+# Copyright (c) 2000, 2009 Michael G. Schwern. All rights reserved.
+# This program is free software; you can redistribute it and/or modify
+# it under the same terms as Perl itself.
package URI::Find::Schemeless;
@@ -14,7 +12,7 @@ use base qw(URI::Find);
use URI::Find ();
use vars qw($VERSION);
-$VERSION = q$Revision: 1.8 $ =~ /(\d\S+)/ ? $1 : '?';
+$VERSION = 20090319;
my($dnsSet) = 'A-Za-z0-9-';
@@ -66,11 +64,11 @@ sub schemeless_uri_re {
| (?:\d{1,3}\.){3}\d{1,3} ) # not inet_aton() complete
(?:
(?=[\s\Q$cruftSet\E]) # followed by unrelated thing
- (?!\.\w) # but don't stop mid foo.xx.bar
- (?<!\.p[ml]) # but exclude Foo.pm and Foo.pl
- |$ # or end of line
- (?<!\.p[ml]) # but exclude Foo.pm and Foo.pl
- |/[$uricSet#]* # or slash and URI chars
+ (?!\.\w) # but don't stop mid foo.xx.bar
+ (?<!\.p[ml]) # but exclude Foo.pm and Foo.pl
+ |$ # or end of line
+ (?<!\.p[ml]) # but exclude Foo.pm and Foo.pl
+ |/[$uricSet#]* # or slash and URI chars
)
}x;
}
@@ -89,24 +87,32 @@ sub top_level_domain_re {
@_ == 1 || __PACKAGE__->badinvo;
my($self) = shift;
+ # Updated from http://www.iana.org/domains/root/db/
my $plain = join '|', qw(
- aero
- biz
- com
- coop
- edu
- gov
- info
- int
- mil
- museum
- name
- net
- org
- pro
+ AERO
+ ARPA
+ ASIA
+ BIZ
+ CAT
+ COM
+ COOP
+ EDU
+ GOV
+ INFO
+ INT
+ JOBS
+ MIL
+ MOBI
+ MUSEUM
+ NAME
+ NET
+ ORG
+ PRO
+ TEL
+ TRAVEL
);
- return qr/(?:[a-z]{2}|$plain)/;
+ return qr/(?:[a-z]{2}|$plain)/i;
}
=head1 AUTHOR
diff --git a/t/Find.t b/t/Find.t
index fb6796b..b47bc80 100644
--- a/t/Find.t
+++ b/t/Find.t
@@ -1,51 +1,14 @@
-#!perl -w
-use strict;
-
-# $Id: Find.t,v 1.12 2005/07/22 10:02:37 roderick Exp $
-
-# Before `make install' is performed this script should be runnable with
-# `make test'. After `make install' it should work as `perl test.pl'
+#!/usr/bin/perl -w
-use vars qw($Total_tests);
-
-######################### We start with some black magic to print on failure.
+use strict;
-my $loaded;
-my $test_num = 1;
-BEGIN { $| = 1; $^W = 1; }
-END {print "not ok $test_num\n" unless $loaded;}
-print "1..$Total_tests\n";
-use URI::Find;
-use URI::Find::Schemeless ();
-$loaded = 1;
-BEGIN { $Total_tests++ }
-ok(1, 'compile');
+use Test::More 'no_plan';
-######################### End of black magic.
+use_ok 'URI::Find';
+use_ok 'URI::Find::Schemeless';
my $No_joined = @ARGV && $ARGV[0] eq '--no-joined' ? shift : 0;
-sub ok {
- my($test, $name) = @_;
- print "not " unless $test;
- print "ok $test_num";
- print " - $name" if defined $name && !$test;
- print "\n";
- $test_num++;
-}
-
-sub eqarray {
- my($a1, $a2) = @_;
- return 0 unless @$a1 == @$a2;
- my $ok = 1;
- for (0..$#{$a1}) {
- unless($a1->[$_] eq $a2->[$_]) {
- $ok = 0;
- last;
- }
- }
- return $ok;
-}
# %Run contains one entry for each type of finder. Keys are mnemonics,
# required to be a single letter. The values are hashes, keys are names
@@ -57,16 +20,16 @@ sub eqarray {
my %Run;
BEGIN {
%Run = (
- # plain
- P => {
- old_interface => sub { run_function(\&find_uris, @_) },
- regular => sub { run_object('URI::Find', @_) },
- },
- # schemeless
- S => {
- schemeless =>
- sub { run_object('URI::Find::Schemeless', @_) },
- },
+ # plain
+ P => {
+ old_interface => sub { run_function(\&find_uris, @_) },
+ regular => sub { run_object('URI::Find', @_) },
+ },
+ # schemeless
+ S => {
+ schemeless =>
+ sub { run_object('URI::Find::Schemeless', @_) },
+ },
);
die if grep { length != 1 } keys %Run;
@@ -76,8 +39,8 @@ BEGIN {
# which contains the %Run keys which will find the URL, the second is
# the URL itself. Eg:
#
-# [PS => 'http://www.foo.com/'] # found by both P and S
-# [S => 'http://asdf.foo.com/'] # only found by S
+# [PS => 'http://www.foo.com/'] # found by both P and S
+# [S => 'http://asdf.foo.com/'] # only found by S
#
# %Tests maps from input text to a list of specs which describe the URLs
# which will be found. If the value is a reference to an empty list, no
@@ -94,6 +57,8 @@ BEGIN {
# ARGH! URI::URL is inconsistant in how it normalizes URLs!
# HTTP URLs get a trailing slash, FTP and gopher do not.
%Tests = (
+ 'Something something something.travel and stuff'
+ => [[ S => 'http://something.travel/' ]],
'<URL:http://www.perl.com>' => 'http://www.perl.com/',
'<ftp://ftp.site.org>' => 'ftp://ftp.site.org',
'<ftp.site.org>' => [[ S => 'ftp://ftp.site.org' ]],
@@ -111,72 +76,71 @@ BEGIN {
=> 'http://www.deja.com/%5BST_rn=ps%5D/qs.xp?ST=PS&svcclass=dnyr&QRY=lwall&defaultOp=AND&DBS=1&OP=dnquery.xp&LNG=ALL&subjects=&groups=&authors=&fromdate=&todate=&showsort=score&maxhits=25',
'Hmmm, Storyserver from news.com. http://news.cnet.com/news/0-1004-200-1537811.html?tag=st.ne.1002.thed.1004-200-1537811 How nice.'
=> [[S => 'http://news.com/'],
- [$all => 'http://news.cnet.com/news/0-1004-200-1537811.html?tag=st.ne.1002.thed.1004-200-1537811']],
+ [$all => 'http://news.cnet.com/news/0-1004-200-1537811.html?tag=st.ne.1002.thed.1004-200-1537811']],
'$html = get("http://www.perl.com/");' => 'http://www.perl.com/',
q|my $url = url('http://www.perl.com/cgi-bin/cpan_mod');|
=> 'http://www.perl.com/cgi-bin/cpan_mod',
'http://www.perl.org/support/online_support.html#mail'
=> 'http://www.perl.org/support/online_support.html#mail',
- 'irc.lightning.net irc.mcs.net'
- => [[S => 'http://irc.lightning.net/'],
- [S => 'http://irc.mcs.net/']],
- 'foo.bar.xx/~baz/',
- => [[S => 'http://foo.bar.xx/~baz/']],
- 'foo.bar.xx/~baz/ abcd.efgh.mil, none.such/asdf/ hi.there.org'
- => [[S => 'http://foo.bar.xx/~baz/'],
- [S => 'http://abcd.efgh.mil/'],
- [S => 'http://hi.there.org/']],
- 'foo:<1.2.3.4>'
- => [[S => 'http://1.2.3.4/']],
- 'mail.eserv.com.au? failed before ? designated end'
- => [[S => 'http://mail.eserv.com.au/']],
- 'foo.info/himom ftp.bar.biz'
- => [[S => 'http://foo.info/himom'],
- [S => 'ftp://ftp.bar.biz']],
- '(http://round.com)' => 'http://round.com/',
- '[http://square.com]' => 'http://square.com/',
- '{http://brace.com}' => 'http://brace.com/',
- '<http://angle.com>' => 'http://angle.com/',
- '(round.com)' => [[S => 'http://round.com/' ]],
- '[square.com]' => [[S => 'http://square.com/' ]],
- '{brace.com}' => [[S => 'http://brace.com/' ]],
- '<angle.com>' => [[S => 'http://angle.com/' ]],
- '<x>intag.com</x>' => [[S => 'http://intag.com/' ]],
- '[mailto:somebody at company.ext]' => 'mailto:somebody at company.ext',
- 'HTtp://MIXED-Case.Com' => 'http://mixed-case.com/',
-
- # False tests
- 'HTTP::Request::Common' => [],
- 'comp.infosystems.www.authoring.cgi' => [],
- 'MIME/Lite.pm' => [],
- 'foo at bar.baz.com' => [],
- 'Foo.pm' => [],
- 'Foo.pl' => [],
- 'hi Foo.pm Foo.pl mom' => [],
- 'x comp.ai.nat-lang libdb.so.3 x' => [],
- 'x comp.ai.nat-lang libdb.so.3 x' => [],
- 'www.marselisl www.info at skive-hallerne.dk' => [],
+ 'irc.lightning.net irc.mcs.net'
+ => [[S => 'http://irc.lightning.net/'],
+ [S => 'http://irc.mcs.net/']],
+ 'foo.bar.xx/~baz/',
+ => [[S => 'http://foo.bar.xx/~baz/']],
+ 'foo.bar.xx/~baz/ abcd.efgh.mil, none.such/asdf/ hi.there.org'
+ => [[S => 'http://foo.bar.xx/~baz/'],
+ [S => 'http://abcd.efgh.mil/'],
+ [S => 'http://hi.there.org/']],
+ 'foo:<1.2.3.4>'
+ => [[S => 'http://1.2.3.4/']],
+ 'mail.eserv.com.au? failed before ? designated end'
+ => [[S => 'http://mail.eserv.com.au/']],
+ 'foo.info/himom ftp.bar.biz'
+ => [[S => 'http://foo.info/himom'],
+ [S => 'ftp://ftp.bar.biz']],
+ '(http://round.com)' => 'http://round.com/',
+ '[http://square.com]' => 'http://square.com/',
+ '{http://brace.com}' => 'http://brace.com/',
+ '<http://angle.com>' => 'http://angle.com/',
+ '(round.com)' => [[S => 'http://round.com/' ]],
+ '[square.com]' => [[S => 'http://square.com/' ]],
+ '{brace.com}' => [[S => 'http://brace.com/' ]],
+ '<angle.com>' => [[S => 'http://angle.com/' ]],
+ '<x>intag.com</x>' => [[S => 'http://intag.com/' ]],
+ '[mailto:somebody at company.ext]' => 'mailto:somebody at company.ext',
+ 'HTtp://MIXED-Case.Com' => 'http://mixed-case.com/',
+
+ # False tests
+ 'HTTP::Request::Common' => [],
+ 'comp.infosystems.www.authoring.cgi' => [],
+ 'MIME/Lite.pm' => [],
+ 'foo at bar.baz.com' => [],
+ 'Foo.pm' => [],
+ 'Foo.pl' => [],
+ 'hi Foo.pm Foo.pl mom' => [],
+ 'x comp.ai.nat-lang libdb.so.3 x' => [],
+ 'x comp.ai.nat-lang libdb.so.3 x' => [],
+ 'www.marselisl www.info at skive-hallerne.dk' => [],
# XXX broken
-# q{$url = 'http://'.rand(1000000).'@anonymizer.com/'.$url;}
-# => [],
+# q{$url = 'http://'.rand(1000000).'@anonymizer.com/'.$url;}
+# => [],
);
# Convert plain string values to a list of 1 spec which indicates
# that all finders will find that as the only URL.
for (@Tests{keys %Tests}) {
- $_ = [[$all, $_]] if !ref;
+ $_ = [[$all, $_]] if !ref;
}
# Run everything together as one big test.
$Tests{join "\n", keys %Tests} = [map { @$_ } values %Tests]
- unless $No_joined;
+ unless $No_joined;
# Each test yields 3 tests for each finder (return value matches
# number returned, matches equal expected matches, text was not
# modified).
my $finders = 0;
$finders += keys %{ $Run{$_} } for keys %Run;
- $Total_tests += 3 * $finders * keys %Tests;
}
# Given a run type and a list of specs, return the URLs which that type
@@ -187,8 +151,8 @@ sub specs_to_urls {
my @out;
for (@spec) {
- my ($found_by_types, $url) = @$_;
- push @out, $url if index($found_by_types, $this_type) >= 0;
+ my ($found_by_types, $url) = @$_;
+ push @out, $url if index($found_by_types, $this_type) >= 0;
}
return @out;
@@ -210,23 +174,19 @@ sub run_object {
sub run {
my ($orig_text, @spec) = @_;
- print "# testing [$orig_text]\n";
+ note "# testing [$orig_text]\n";
for my $run_type (keys %Run) {
- print "# run type $run_type\n";
- while( my($run_name, $run_sub) = each %{ $Run{$run_type} } ) {
- print "# running $run_name\n";
- my @want = specs_to_urls $run_type, @spec;
- my $text = $orig_text;
- my @out;
- my $n = $run_sub->(\$text, sub { push @out, $_[0]; $_[1] });
- ok $n == @out,
- "invalid return value, returned $n but got " . scalar @out;
- ok eqarray(\@want, \@out),
- "output mismatch, want:\n" . join("\n", @want)
- . "\ngot:\n" . join("\n", @out);
- ok $text eq $orig_text,
- "text was modified, [$orig_text] => [$text]";
- }
+ note "# run type $run_type\n";
+ while( my($run_name, $run_sub) = each %{ $Run{$run_type} } ) {
+ note "# running $run_name\n";
+ my @want = specs_to_urls $run_type, @spec;
+ my $text = $orig_text;
+ my @out;
+ my $n = $run_sub->(\$text, sub { push @out, $_[0]; $_[1] });
+ is $n, @out, "return value length";
+ is_deeply \@out, \@want, "output" or diag("Original text: $text");
+ is $text, $orig_text, "text unmodified";
+ }
}
}
@@ -236,11 +196,10 @@ while( my($text, $rspec_list) = each %Tests ) {
# We used to turn URI::URL strict on and leave it on.
-BEGIN { $Total_tests += 2 }
for my $val (0, 1) {
URI::URL::strict($val);
my $f = URI::Find->new(sub { });
my $t = "foo";
$f->find(\$t);
- ok $val == URI::URL::strict, "URI::URL::strict $val";
+ is $val, URI::URL::strict(), "URI::URL::strict $val";
}
diff --git a/t/filter.t b/t/filter.t
new file mode 100644
index 0000000..c7636e4
--- /dev/null
+++ b/t/filter.t
@@ -0,0 +1,39 @@
+#!/usr/bin/perl -w
+
+# Test the filter function
+
+use strict;
+
+use Test::More 'no_plan';
+
+use URI::Find;
+
+
+my @tasks = (
+ ["Foo&Bar http://abc.com.", "Foo&Bar xx&."],
+ ["http://abc.com. http://abc.com.", "xx&. xx&."],
+ ["http://abc.com?foo=bar&baz=foo", "xx&"],
+ ["& http://abc.com?foo=bar&baz=foo", "& xx&"],
+ ["http://abc.com?foo=bar&baz=foo &", "xx& &"],
+ ["Foo&Bar http://abc.com", "Foo&Bar xx&"],
+ ["http://abc.com. Foo&Bar", "xx&. Foo&Bar"],
+ ["Foo&Bar http://abc.com. Foo&Bar", "Foo&Bar xx&. Foo&Bar"],
+ ["Foo&Bar\nhttp://abc.com.\nFoo&Bar", "Foo&Bar\nxx&.\nFoo&Bar"],
+ ["Foo&Bar\nhttp://abc.com. http://def.com.\nFoo&Bar",
+ "Foo&Bar\nxx&. xx&.\nFoo&Bar"],
+);
+
+for my $task (@tasks) {
+ my($str, $result) = @$task;
+ my $org = $str;
+ my $f = URI::Find->new(sub { return "xx&" });
+ $f->find(\$str, \&simple_escape);
+ is($str, $result, "escape $org");
+}
+
+sub simple_escape {
+ my($toencode) = @_;
+
+ $toencode =~ s{&}{&}gso;
+ return $toencode;
+}
diff --git a/t/is_schemed.t b/t/is_schemed.t
new file mode 100644
index 0000000..ff735d1
--- /dev/null
+++ b/t/is_schemed.t
@@ -0,0 +1,17 @@
+#!/usr/bin/perl -w
+
+use strict;
+
+use Test::More 'no_plan';
+
+use URI::Find;
+
+my @tests = (
+ ["http://foo.bar" => 1],
+ ["foo.com" => 0],
+);
+
+for my $test (@tests) {
+ my($uri, $want) = @$test;
+ is !!URI::Find->is_schemed($uri), !!$want, "is_schemed($uri)";
+}
diff --git a/t/load-schemeless.t b/t/load-schemeless.t
index 9e795b2..2fd0217 100644
--- a/t/load-schemeless.t
+++ b/t/load-schemeless.t
@@ -1,16 +1,11 @@
-#!perl -w
-use strict;
-
-# $Id: load-schemeless.t,v 1.2 2001/07/27 12:41:54 roderick Exp $
+#!/usr/bin/perl -w
# An error in base.pm in 5.005_03 causes it not to load URI::Find when
# invoked from URI::Find::Schemeless. Prevent regression.
-BEGIN {
- print "1..1\n";
-}
+use strict;
+
+use Test::More tests => 2;
-print eval { require URI::Find::Schemeless;
- URI::Find::Schemeless->new(sub {}) }
- ? "ok 1\n"
- : "not ok 1 ($@)\n";
+require_ok 'URI::Find::Schemeless';
+new_ok 'URI::Find::Schemeless' => [sub {}];
diff --git a/t/rfc3986_appendix_c.t b/t/rfc3986_appendix_c.t
new file mode 100644
index 0000000..da170fc
--- /dev/null
+++ b/t/rfc3986_appendix_c.t
@@ -0,0 +1,34 @@
+#!/usr/bin/perl -w
+
+use strict;
+
+
+# RFC 3986 Appendix C covers "Delimiting a URI in Context"
+# and it has this example...
+
+my $Example = <<"END";
+Yes, Jim, I found it under "http://www.w3.org/Addressing/",
+but you can probably pick it up from <ftp://foo.example.
+com/rfc/>. Note the warning in <http://www.ics.uci.edu/pub/
+ietf/uri/historical.html#WARNING>.
+END
+
+# Which should find these URIs
+my @Uris = (
+ "http://www.w3.org/Addressing/",
+ "ftp://foo.example.com/rfc/",
+ "http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING",
+);
+
+
+use Test::More tests => 1;
+use URI::Find;
+
+my @found;
+my $finder = URI::Find->new(sub {
+ my($uri) = @_;
+ push @found, $uri;
+});
+$finder->find(\$Example);
+
+is_deeply \@found, \@Uris, "RFC 3986 Appendix C example";
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/pkg-perl/packages/liburi-find-perl.git
More information about the Pkg-perl-cvs-commits
mailing list