r72213 - in /trunk/w3c-linkchecker: MANIFEST META.yml NEWS SIGNATURE bin/checklink bin/checklink.pod debian/changelog docs/linkchecker.js etc/checklink.conf lib/W3C/LinkChecker.pm

periapt-guest at users.alioth.debian.org periapt-guest at users.alioth.debian.org
Sun Apr 3 21:54:42 UTC 2011


Author: periapt-guest
Date: Sun Apr  3 21:54:29 2011
New Revision: 72213

URL: http://svn.debian.org/wsvn/pkg-perl/?sc=1&rev=72213
Log:
TODO: check javascript file carefully
* New upstream release

Added:
    trunk/w3c-linkchecker/docs/linkchecker.js
      - copied unchanged from r72212, branches/upstream/w3c-linkchecker/current/docs/linkchecker.js
Modified:
    trunk/w3c-linkchecker/MANIFEST
    trunk/w3c-linkchecker/META.yml
    trunk/w3c-linkchecker/NEWS
    trunk/w3c-linkchecker/SIGNATURE
    trunk/w3c-linkchecker/bin/checklink
    trunk/w3c-linkchecker/bin/checklink.pod
    trunk/w3c-linkchecker/debian/changelog
    trunk/w3c-linkchecker/etc/checklink.conf
    trunk/w3c-linkchecker/lib/W3C/LinkChecker.pm

Modified: trunk/w3c-linkchecker/MANIFEST
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/w3c-linkchecker/MANIFEST?rev=72213&op=diff
==============================================================================
--- trunk/w3c-linkchecker/MANIFEST (original)
+++ trunk/w3c-linkchecker/MANIFEST Sun Apr  3 21:54:29 2011
@@ -9,7 +9,8 @@
 etc/checklink.conf      Optional configuration file
 etc/perltidyrc          perltidy(1) profile
 docs/checklink.html     Additional documentation
-docs/linkchecker.css    Cascading style sheet for the documentation
+docs/linkchecker.css    Cascading style sheet used in docs and generated HTML
+docs/linkchecker.js     JavaScript used in the generated HTML
 images/double.png
 images/grad.png
 images/head-bl.png

Modified: trunk/w3c-linkchecker/META.yml
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/w3c-linkchecker/META.yml?rev=72213&op=diff
==============================================================================
--- trunk/w3c-linkchecker/META.yml (original)
+++ trunk/w3c-linkchecker/META.yml Sun Apr  3 21:54:29 2011
@@ -1,6 +1,6 @@
 --- #YAML:1.0
 name:               W3C-LinkChecker
-version:            4.7
+version:            4.8
 abstract:           W3C Link Checker
 author:
     - W3C QA-dev Team <public-qa-dev at w3.org>

Modified: trunk/w3c-linkchecker/NEWS
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/w3c-linkchecker/NEWS?rev=72213&op=diff
==============================================================================
--- trunk/w3c-linkchecker/NEWS (original)
+++ trunk/w3c-linkchecker/NEWS Sun Apr  3 21:54:29 2011
@@ -1,7 +1,15 @@
 This document contains information about high level changes between
 Link Checker releases.
 
-Version 4.7
+Version 4.8 - 2011-04-02
+- Avoid some robot delays by improving the order in which links are checked.
+- Avoid some unnecessary HEAD requests in recursive mode.
+- Clarify output wrt. links that have already been checked.
+- Make connection cache size configurable, and increase the default to 2.
+- Move JavaScript to an external file.
+- Check applet and object archive links.
+
+Version 4.7 - 2011-03-17
 - Support for IRI.
 - Support for more HTML5 links.
 - Decode query string parameters as UTF-8.
@@ -9,7 +17,7 @@
 - New dependencies: Encode-Locale (command line mode only).
 - Updated dependencies: libwww-perl >= 5.833, URI >= 1.53.
 
-Version 4.6
+Version 4.6 - 2010-05-01
 - Support for checking links in CSS.
 - Results UI improvements, added "progress bar".
 - Support for larger variety of character and content encodings.
@@ -22,13 +30,13 @@
 - New dependencies: CSS-DOM >= 0.09.
 - Updated dependencies: Perl >= 5.8.
 
-Version 4.5
+Version 4.5 - 2009-03-30
 - Removed W3C trademarked icons from distribution tarball.
 - Avoid "false positive" failures from "make test" in certain setups.
 - Make quiet command line mode quieter.
 - Lowered default timeout to 30 seconds.
 
-Version 4.4
+Version 4.4 - 2009-02-12
 - checking more elements and attributes, such as BLOCKQUOTE cite="", BODY
   background="", EMBED, etc
 - Changes in the UI to make it match other validators more closely
@@ -38,21 +46,21 @@
 - Add non-robot developer mode
 - many bug fixes and code cleanup
 
-Version 4.3
+Version 4.3 - 2006-10-22
 - Various minor improvements to result output, both in text and HTML modes.
 - Fix --quiet and checking multiple documents to match documentation.
 - Eliminate various warnings (emitted by code, not from results).
 - Documentation improvements.
 
-Version 4.2.1
+Version 4.2.1 - 2005-05-15
 - Include documentation of the reorganized access keys.
 
-Version 4.2
+Version 4.2 - 2005-04-27
 - Access key reorganization, making them less likely to conflict with
   browsers' "native" key bindings.
 - Redirects are now checked for private IP addresses too.
 
-Version 4.1
+Version 4.1 - 2004-11-24
 - Added workarounds against browser timeouts in "summary only" mode.
 - Improved caching and reuse of fetched /robots.txt information.
 - Fixed a bug where a complete protocol response (including headers)

Modified: trunk/w3c-linkchecker/SIGNATURE
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/w3c-linkchecker/SIGNATURE?rev=72213&op=diff
==============================================================================
--- trunk/w3c-linkchecker/SIGNATURE (original)
+++ trunk/w3c-linkchecker/SIGNATURE Sun Apr  3 21:54:29 2011
@@ -14,16 +14,17 @@
 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA1
 
-SHA1 be94f7305b57b86945ffac2e855cd9c687d829ec MANIFEST
-SHA1 838d23b4a1e435126c9ee820fdd51f7f0baa43ca META.yml
+SHA1 b075772a968f5694bfbb4ce33eadf26566a25f47 MANIFEST
+SHA1 2c2e46c15a894e6fdb6360a96d3e46fef368ea13 META.yml
 SHA1 ab9150095a45776c2020e5781d19054c7018da8b Makefile.PL
-SHA1 66a454333dfdb1ab49d1846d583aba735ed35ab7 NEWS
+SHA1 0e45d552ca655a7aa616b5580fe26360194c7b25 NEWS
 SHA1 f1f868ea73db7d39ab491ebb50c84de76cce4b44 README
-SHA1 4e5c0c858e53971eb11d4621cc36720617242f6c bin/checklink
-SHA1 07cc637f007a0d57868a9f1105d4e9b7c6c8da5d bin/checklink.pod
+SHA1 75b87d400f5656fa36865cbb8638ae761ef8a045 bin/checklink
+SHA1 4406433ae670dd4f7be3f2c76d55aefb239e9bc9 bin/checklink.pod
 SHA1 b188063249c820f0aa5a34b5f735e8f334a536e1 docs/checklink.html
 SHA1 fa101fed018fc8e41beca63a0a667fb94c10a557 docs/linkchecker.css
-SHA1 94659a6cba9d947859df23d202aa4c411e2c488b etc/checklink.conf
+SHA1 8fa71b54357c9ed6ac8e01ab600120032d35b080 docs/linkchecker.js
+SHA1 92d01a8a6e7edcd200d70492f4e551984b97b7a0 etc/checklink.conf
 SHA1 87c74944dbc80b5d6ab8aac1d09419607b15efff etc/perltidyrc
 SHA1 bcb7896bee3764f85a03ab14495efc233f70e215 images/double.png
 SHA1 ff9a7be7fee245dd81a7dc4124544d692a140119 images/grad.png
@@ -37,16 +38,16 @@
 SHA1 401b5fba02d0d8484775a4a77503fa0d136b96ce images/round-br.png
 SHA1 9eb1ee6188391715284a3db080e6e92d163864d9 images/round-tr.png
 SHA1 cc01bd358bc1d6d42ca350ad0a4a42778ca4440e images/textbg.png
-SHA1 307f3bef10b817772b619a97b23a711fd06fd3e8 lib/W3C/LinkChecker.pm
+SHA1 993d4a54cd4a6672afeaa938d15cd9154f94aa44 lib/W3C/LinkChecker.pm
 SHA1 962ba9fff082c4087239b55618ada2a8f1564464 t/00compile.t
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
-iQEVAwUBTYJfUod580Rxl2NsAQKW6Af/XVf0TroCS1AuV7y9gEqWLGFxEfqY19WF
-f44SU3dwuawr2NMSOPBCcdpThfPrteVMBXjZYTXS7kqZnWcQ2chwTPknYX6g6zrX
-L5mmXNxx7CuYG/1CO1h0+deZ231Z+/R/0uKYDOsL9FsdhTrAJ7qyP3tJyfNECZie
-0g+t+xtYRhXbMUw1LFSB+81szZv1XXZVRnKhLWP54kwBVbebt4/XmsMCYtdCDzPK
-2IU4NSssB/rNxykbNTT8EqPYT8ecXeNG7YqNZkdcGimKzfzsYxdcZnIo6WLGP6Yg
-sTon4mKVsIpeGwYYf4uAprc1Jqf+g+EOhUOP7XWOf6sWQkFrmYR6zQ==
-=APVv
+iQEVAwUBTZdffId580Rxl2NsAQIQgwf/VFvg4vg7KvODiSA5vkfmGJU56Pr9Oxbq
+MCkmCpWfVHo3i4Dzxz7QTubELk6nksKHaoUfVdDCmgRaG9XNVZBb59WCPzedFYsS
+7BoUpzB0u580fOfBO0FhxbEIfEVoGplFN/9BTMBHzJxO/fSRNwHnqsZ1nn1yeCN4
+j23yqibBQapFnd8NFyNHSEzTDEsqtV7cLLYljJlljYP5au2IChaV3hAJ3gsRs0OL
+KVLGGoPQSHR/MhxzIWfituh8MwB4ttjZ5Z0AQibiUfcCfxBA+rgrsT61rquLJOmk
+wUQQHXZVBj9xXdB7fbbezi44+kqOf4U2GTgmNr1quexHm1W24YPd9w==
+=yFBd
 -----END PGP SIGNATURE-----

Modified: trunk/w3c-linkchecker/bin/checklink
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/w3c-linkchecker/bin/checklink?rev=72213&op=diff
==============================================================================
--- trunk/w3c-linkchecker/bin/checklink (original)
+++ trunk/w3c-linkchecker/bin/checklink Sun Apr  3 21:54:29 2011
@@ -268,18 +268,19 @@
     video  => ['src', 'poster'],
 };
 
-# Tag=>attribute mapping of things we treat as space separated lists of links.
+# Tag=>[separator, attributes] mapping of things we treat as lists of links.
 use constant LINK_LIST_ATTRS => {
-    a    => ['ping'],
-    area => ['ping'],
-    head => ['profile'],
+    a      => [qr/\s+/,    ['ping']],
+    applet => [qr/[\s,]+/, ['archive']],
+    area   => [qr/\s+/,    ['ping']],
+    head   => [qr/\s+/,    ['profile']],
+    object => [qr/\s+/,    ['archive']],
 };
 
 # TBD/TODO:
-# - applet/@archive, @code?
+# - applet/@code?
 # - bgsound/@src?
 # - object/@classid?
-# - object/@archive?
 # - isindex/@action?
 # - layer/@background, at src?
 # - ilayer/@background?
@@ -293,7 +294,7 @@
     # Version info
     $PACKAGE  = 'W3C Link Checker';
     $PROGRAM  = 'W3C-checklink';
-    $VERSION  = '4.7';
+    $VERSION  = '4.8';
     $REVISION = sprintf('version %s (c) 1999-2011 W3C', $VERSION);
     $AGENT    = sprintf(
         '%s/%s %s',
@@ -362,33 +363,17 @@
     $DocType =
         '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">';
     my $css_url = URI->new_abs('linkchecker.css', $Cfg{Doc_URI});
-    $Head = sprintf(<<'EOF', HTML::Entities::encode($AGENT), $css_url);
+    my $js_url  = URI->new_abs('linkchecker.js',  $Cfg{Doc_URI});
+    $Head =
+        sprintf(<<'EOF', HTML::Entities::encode($AGENT), $css_url, $js_url);
 <meta http-equiv="Content-Script-Type" content="text/javascript" />
 <meta name="generator" content="%s" />
 <link rel="stylesheet" type="text/css" href="%s" />
-<script type="text/javascript">
-function show_progress(progress_id, progress_text, progress_percentage)
-{
-    var div = document.getElementById("progress" + progress_id);
-
-    var head = div.getElementsByTagName("h3")[0];
-    var text = document.createTextNode(progress_text);
-    var span = document.createElement("span");
-    span.appendChild(text);
-    head.replaceChild(span, head.getElementsByTagName("span")[0]);
-
-    var bar = div.getElementsByTagName("div")[0];
-    bar.firstChild.style.width = progress_percentage;
-    bar.title = progress_percentage;
-
-    var pre = div.getElementsByTagName("pre")[0];
-    pre.scrollTop = pre.scrollHeight;
-}
-</script>
+<script type="text/javascript" src="%s"></script>
 EOF
 
     # Trusted environment variables that need laundering in taint mode.
-    foreach (qw(NNTPSERVER NEWSHOST)) {
+    for (qw(NNTPSERVER NEWSHOST)) {
         ($ENV{$_}) = ($ENV{$_} =~ /^(.*)$/) if $ENV{$_};
     }
 
@@ -417,6 +402,7 @@
     Hide_Same_Realm => 0,
     Depth                    => 0,             # < 0 means unlimited recursion.
     Sleep_Time               => 1,
+    Connection_Cache_Size    => 2,
     Max_Documents            => 150,           # For the online version.
     User                     => undef,
     Password                 => undef,
@@ -462,8 +448,7 @@
 
 my $ua = W3C::UserAgent->new($AGENT);    # @@@ TODO: admin address
 
-# @@@ make number of keep-alive connections customizable
-$ua->conn_cache({total_capacity => 1});    # 1 keep-alive connection
+$ua->conn_cache({total_capacity => $Opts{Connection_Cache_Size}});
 if ($ua->can('delay')) {
     $ua->delay($Opts{Sleep_Time} / 60);
 }
@@ -533,7 +518,7 @@
 
     my $check_num = 1;
     my @bases     = @{$Opts{Base_Locations}};
-    foreach my $uri (@ARGV) {
+    for my $uri (@ARGV) {
 
         # Reset base locations so that previous URI's given on the command line
         # won't affect the recursion scope for this URI (see check_uri())
@@ -550,7 +535,7 @@
     if ($Opts{HTML}) {
         &html_footer();
     }
-    elsif (($doc_count > 0) && !$Opts{Summary_Only}) {
+    elsif ($doc_count > 0 && !$Opts{Summary_Only}) {
         printf("\n%s\n", &global_stats());
     }
 
@@ -589,7 +574,7 @@
     $uri = $query->param('uri');
 
     if (!$uri) {
-        &html_header('', 1);    # Set cookie only from results page.
+        &html_header('', undef);    # Set cookie only from results page.
         my %cookies = CGI::Cookie->fetch();
         &print_form(scalar($query->Vars()), $cookies{$PROGRAM}, 1);
         &html_footer();
@@ -737,6 +722,7 @@
         'u|user=s'                    => \$Opts{User},
         'p|password=s'                => \$Opts{Password},
         't|timeout=i'                 => \$Opts{Timeout},
+        'C|connection-cache=i'        => \$Opts{Connection_Cache_Size},
         'S|sleep=i'                   => \$Opts{Sleep_Time},
         'L|languages=s'               => \$Opts{Accept_Language},
         'c|cookies=s'                 => \$Opts{Cookies},
@@ -1055,7 +1041,7 @@
     return if defined($response->{Stop});
 
     if ($Opts{HTML}) {
-        &html_header($uri, 0, $cookie) if ($check_num == 1);
+        &html_header($uri, $cookie) if ($check_num == 1);
         &print_form($params, $cookie, $check_num) if $is_start;
     }
 
@@ -1227,6 +1213,7 @@
         scalar(keys %{$p->{Links}}))
         if ($Opts{Verbose});
     my %links;
+    my %hostlinks;
 
     # Record all the links found
     while (my ($link, $lines) = each(%{$p->{Links}})) {
@@ -1247,7 +1234,13 @@
         my $canon_uri = URI->new($abs_link_uri->canonical());
         my $fragment  = $canon_uri->fragment(undef);
         if (!defined($Opts{Exclude}) || $canon_uri !~ $Opts{Exclude}) {
-            foreach my $line_num (keys(%$lines)) {
+            if (!exists($links{$canon_uri})) {
+                my $hostport =
+                    $canon_uri->can('host_port') ? $canon_uri->host_port() :
+                                                   '';
+                push(@{$hostlinks{$hostport}}, $canon_uri);
+            }
+            for my $line_num (keys(%$lines)) {
                 if (!defined($fragment) || !length($fragment)) {
 
                     # Document without fragment
@@ -1262,17 +1255,20 @@
         }
     }
 
+    my @order = &distribute_links(\%hostlinks);
+    undef %hostlinks;
+
     # Build the list of broken URI's
 
-    my $nlinks = scalar(keys(%links));
+    my $nlinks = scalar(@order);
 
     &hprintf("Checking %d links to build list of broken URI's\n", $nlinks)
         if ($Opts{Verbose});
 
     my %broken;
     my $link_num = 0;
-    while (my ($u, $ulinks) = each(%links)) {
-        $u = URI->new($u);
+    for my $u (@order) {
+        my $ulinks = $links{$u};
 
         if ($Opts{Summary_Only}) {
 
@@ -1330,7 +1326,7 @@
             $broken{$u}{location} = 1;
 
             # All the fragments associated are hence broken
-            foreach my $fragment (keys %{$ulinks->{fragments}}) {
+            for my $fragment (keys %{$ulinks->{fragments}}) {
                 $broken{$u}{fragments}{$fragment}++;
             }
         }
@@ -1357,7 +1353,7 @@
     # Do we want to process other documents?
     if ($depth != 0) {
 
-        foreach my $u (map { URI->new($_) } keys %links) {
+        for my $u (map { URI->new($_) } keys %links) {
 
             next unless $results{$u}{location}{success};    # Broken link?
 
@@ -1402,6 +1398,42 @@
     return;
 }
 
+###############################################################
+# Distribute links based on host:port to avoid RobotUA delays #
+###############################################################
+
+sub distribute_links(\%)
+{
+    my $hostlinks = shift;
+
+    # Hosts ordered by weight (number of links), descending
+    my @order =
+        sort { scalar(@{$hostlinks->{$b}}) <=> scalar(@{$hostlinks->{$a}}) }
+        keys %$hostlinks;
+
+    # All link list flattened into one, in host weight order
+    my @all;
+    push(@all, @{$hostlinks->{$_}}) for @order;
+
+    return @all if (scalar(@order) < 2);
+
+    # Indexes and chunk size for "zipping" the end result list
+    my $num = scalar(@{$hostlinks->{$order[0]}});
+    my @indexes = map { $_ * $num } (0 .. $num - 1);
+
+    # Distribute them
+    my @result;
+    while (my @chunk = splice(@all, 0, $num)) {
+        @result[@indexes] = @chunk;
+        @indexes = map { $_ + 1 } @indexes;
+    }
+
+    # Weed out undefs
+    @result = grep(defined, @result);
+
+    return @result;
+}
+
 ##########################################
 # Decode Content-Encodings in a response #
 ##########################################
@@ -1457,13 +1489,13 @@
     # Get the resource
     my $response;
     if (defined($results{$uri}{response}) &&
-        !(($method eq 'GET') && ($results{$uri}{method} eq 'HEAD')))
+        !($method eq 'GET' && $results{$uri}{method} eq 'HEAD'))
     {
         $response = $results{$uri}{response};
     }
     else {
         $response = &get_uri($method, $uri, $referer);
-        &record_results($uri, $method, $response);
+        &record_results($uri, $method, $response, $referer);
         &record_redirects($redirects, $response);
     }
     if (!$response->is_success()) {
@@ -1476,7 +1508,7 @@
             }
             else {
                 if ($Opts{HTML}) {
-                    &html_header($uri, 0, $cookie) if ($check_num == 1);
+                    &html_header($uri, $cookie) if ($check_num == 1);
                     &print_form($params, $cookie, $check_num) if $is_start;
                     print "<p>", &status_icon($response->code());
                 }
@@ -1510,7 +1542,7 @@
         # No, there is a problem...
         if (!$in_recursion) {
             if ($Opts{HTML}) {
-                &html_header($uri, 0, $cookie) if ($check_num == 1);
+                &html_header($uri, $cookie) if ($check_num == 1);
                 &print_form($params, $cookie, $check_num) if $is_start;
                 print "<p>", &status_icon(406);
 
@@ -1543,7 +1575,7 @@
         return 0 if ($candidate =~ $excluded_doc);
     }
 
-    foreach my $base (@{$Opts{Base_Locations}}) {
+    for my $base (@{$Opts{Base_Locations}}) {
         my $rel = $candidate->rel($base);
         next if ($candidate eq $rel);    # Relative path not possible?
         next if ($rel =~ m|^(\.\.)?/|);  # Relative path upwards?
@@ -1704,9 +1736,10 @@
 # Record the results of an HTTP request #
 #########################################
 
-sub record_results (\$$$)
-{
-    my ($uri, $method, $response) = @_;
+sub record_results (\$$$$)
+{
+    my ($uri, $method, $response, $referer) = @_;
+    $results{$uri}{referer}        = $referer;
     $results{$uri}{response}       = $response;
     $results{$uri}{method}         = $method;
     $results{$uri}{location}{code} = $response->code();
@@ -1753,8 +1786,8 @@
 
     # What type of broken link is it? (stored in {record} - the {display}
     #              information is just for visual use only)
-    if (($results{$uri}{location}{display} == 401) &&
-        ($results{$uri}{location}{code} == 404))
+    if ($results{$uri}{location}{display} == 401 &&
+        $results{$uri}{location}{code} == 404)
     {
         $results{$uri}{location}{record} = 404;
     }
@@ -2015,6 +2048,10 @@
         elsif ($tag eq 'applet' || $tag eq 'object') {
             if (my $codebase = $attr->{codebase}) {
 
+                # Applet codebases are directories, append trailing slash
+                # if it's not there so that new_abs does the right thing.
+                $codebase .= "/" if ($tag eq 'applet' && $codebase !~ m|/$|);
+
                 # TODO: HTML 4 spec says applet/@codebase may only point to
                 # subdirs of the directory containing the current document.
                 # Should we do something about that?
@@ -2031,9 +2068,10 @@
 
         # List of links attributes:
         if (my $link_attrs = LINK_LIST_ATTRS()->{$tag}) {
-            for my $la (@$link_attrs) {
+            my ($sep, $attrs) = @$link_attrs;
+            for my $la (@$attrs) {
                 if (defined(my $value = $attr->{$la})) {
-                    for my $link (split(/\s+/, $value)) {
+                    for my $link (split($sep, $value)) {
                         $self->add_link($link, $tag_local_base, $line);
                     }
                 }
@@ -2109,9 +2147,9 @@
 
     # Extract the doctype
     my @declaration = split(/\s+/, $text, 4);
-    if (($#declaration >= 3) &&
-        ($declaration[0] eq 'DOCTYPE') &&
-        (lc($declaration[1]) eq 'html'))
+    if ($#declaration >= 3 &&
+        $declaration[0] eq 'DOCTYPE' &&
+        lc($declaration[1]) eq 'html')
     {
 
         # Parse the doctype declaration
@@ -2164,25 +2202,28 @@
     # $links is a hash of the links in the documents checked
     # $redirects is a map of the redirects encountered
 
-    # Get the document with the appropriate method
-    # Only use GET if there are fragments. HEAD is enough if it's not the
-    # case.
-    my @fragments = keys %{$links->{$uri}{fragments}};
-    my $method = scalar(@fragments) ? 'GET' : 'HEAD';
+    # Get the document with the appropriate method: GET if there are
+    # fragments to check or links are wanted, HEAD is enough otherwise.
+    my $fragments = $links->{$uri}{fragments} || {};
+    my $method = ($want_links || %$fragments) ? 'GET' : 'HEAD';
 
     my $response;
     my $being_processed = 0;
-    if ((!defined($results{$uri})) ||
-        (($method eq 'GET') && ($results{$uri}{method} eq 'HEAD')))
+    if (!defined($results{$uri}) ||
+        ($method eq 'GET' && $results{$uri}{method} eq 'HEAD'))
     {
         $being_processed = 1;
         $response = &get_uri($method, $uri, $referer);
 
         # Get the information back from get_uri()
-        &record_results($uri, $method, $response);
+        &record_results($uri, $method, $response, $referer);
 
         # Record the redirects
         &record_redirects($redirects, $response);
+    }
+    elsif (!($Opts{Summary_Only} || (!$doc_count && $Opts{HTML}))) {
+        my $ref = $results{$uri}{referer};
+        &hprintf("Already checked%s\n", $ref ? ", referrer $ref" : ".");
     }
 
     # We got the response of the HTTP request. Stop here if it was a HEAD.
@@ -2220,7 +2261,7 @@
     }
 
     # Check that the fragments exist
-    foreach my $fragment (keys %{$links->{$uri}{fragments}}) {
+    for my $fragment (keys %$fragments) {
         if (defined($p->{Anchors}{$fragment}) ||
             &escape_match($fragment, $p->{Anchors}) ||
             grep { $_ eq "$uri#$fragment" } @{$Opts{Suppress_Fragment}})
@@ -2237,7 +2278,7 @@
 sub escape_match ($\%)
 {
     my ($a, $hash) = (URI::Escape::uri_unescape($_[0]), $_[1]);
-    foreach my $b (keys %$hash) {
+    for my $b (keys %$hash) {
         return 1 if ($a eq URI::Escape::uri_unescape($b));
     }
     return 0;
@@ -2467,7 +2508,7 @@
 EOF
     print("\n");
 
-    foreach my $anchor (@errors) {
+    for my $anchor (@errors) {
         my $format;
         my @unique = &sort_unique(
             map { line_number($_) }
@@ -2499,7 +2540,7 @@
 
     # Process each URL
     my ($c, $previous_c);
-    foreach my $u (@$urls) {
+    for my $u (@$urls) {
         my @fragments = keys %{$broken->{$u}{fragments}};
 
         # Did we get a redirect?
@@ -2508,7 +2549,7 @@
         # List of lines
         my @total_lines;
         push(@total_lines, keys(%{$links->{$u}{location}}));
-        foreach my $f (@fragments) {
+        for my $f (@fragments) {
             push(@total_lines, keys(%{$links->{$u}{fragments}{$f}}))
                 unless ($f eq $u && defined($links->{$u}{$u}{LINE_UNKNOWN()}));
         }
@@ -2687,7 +2728,7 @@
         }
 
         # Fragments
-        foreach my $f (@fragments) {
+        for my $f (@fragments) {
             my @unique_lines =
                 &sort_unique(keys %{$links->{$u}{fragments}{$f}});
             my $plural = (scalar(@unique_lines) > 1) ? 's' : '';
@@ -2767,10 +2808,8 @@
         RC_ROBOTS_TXT() => sprintf(
             'The link was not checked due to %srobots exclusion rules%s. Check the link manually, and see also the link checker %sdocumentation on robots exclusion%s.',
             $Opts{HTML} ? (
-                '<a href="http://www.robotstxt.org/robotstxt.html">',
-                '</a>',
-                "<a href=\"$Cfg{Doc_URI}#bot\">",
-                '</a>'
+                '<a href="http://www.robotstxt.org/robotstxt.html">', '</a>',
+                "<a href=\"$Cfg{Doc_URI}#bot\">",                     '</a>'
                 ) : ('') x 4
         ),
         RC_DNS_ERROR() =>
@@ -2840,7 +2879,7 @@
         # Sort the URI's by HTTP Code
         my %code_summary;
         my @idx;
-        foreach my $u (@urls) {
+        for my $u (@urls) {
             if (defined($results->{$u}{location}{record})) {
                 my $c = &code_shown($u, $results);
                 $code_summary{$c}++;
@@ -2878,7 +2917,7 @@
 </thead>
 <tbody>
 EOF
-            foreach my $code (sort(keys(%code_summary))) {
+            for my $code (sort(keys(%code_summary))) {
                 printf('<tr%s>', &bgcolor($code));
                 printf('<td><a href="#d%scode_%s">%s</a></td>',
                     $doc_count, $code, http_rc($code));
@@ -2934,16 +2973,16 @@
 # HTML interface #
 ##################
 
-sub html_header ($;$$)
-{
-    my ($uri, $doform, $cookie) = @_;
+sub html_header ($$)
+{
+    my ($uri, $cookie) = @_;
 
     my $title = defined($uri) ? $uri : '';
     $title = ': ' . $title if ($title =~ /\S/);
 
     my $headers = '';
     if (!$Opts{Command_Line}) {
-        $headers .= "Cache-Control: no-cache\nPragma: no-cache\n" if $doform;
+        $headers .= "Cache-Control: no-cache\nPragma: no-cache\n" if $uri;
         $headers .= "Content-Type: text/html; charset=utf-8\n";
         $headers .= "Set-Cookie: $cookie\n"                       if $cookie;
 
@@ -2952,40 +2991,14 @@
         $headers .= "Content-Language: en\n\n";
     }
 
-    my $script = my $onload = '';
-    if ($doform) {
-        $script = <<'EOF';
-<script type="text/javascript">
-function uriOk(num)
-{
-  if (document.getElementById) {
-    var u = document.getElementById('uri_' + num);
-    var ok = false;
-    if (u.value.length > 0) {
-      if (u.value.search) {
-        ok = (u.value.search(/\S/) !== -1);
-      } else {
-        ok = true;
-      }
-    }
-    if (! ok) {
-      u.focus();
-    }
-    return ok;
-  }
-  return true;
-}
-</script>
-EOF
-        $onload =
-            ' onload="if(document.getElementById){document.getElementById(\'uri_1\').focus()}"';
-    }
+    my $onload = $uri ? '' :
+          ' onload="if(document.getElementById){document.getElementById(\'uri_1\').focus()}"';
 
     print $headers, $DocType, "
 <html lang=\"en\" xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\">
 <head>
 <title>W3C Link Checker", &encode($title), "</title>
-", $Head, $script, "</head>
+",      $Head,   "</head>
 <body", $onload, '>';
     &banner($title);
     return;

Modified: trunk/w3c-linkchecker/bin/checklink.pod
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/w3c-linkchecker/bin/checklink.pod?rev=72213&op=diff
==============================================================================
--- trunk/w3c-linkchecker/bin/checklink.pod (original)
+++ trunk/w3c-linkchecker/bin/checklink.pod Sun Apr  3 21:54:29 2011
@@ -160,6 +160,12 @@
 
 Timeout for requests, in seconds.  The default is 30.
 
+=item B<-C, --connection-cache> I<number>
+
+Maximum number of cached connections.  Using this option overrides the
+C<Connection_Cache_Size> configuration file parameter, see its
+documentation below for the default value and more information.
+
 =item B<-d, --domain> I<domain>
 
 Perl regular expression describing the domain to which the authentication
@@ -234,12 +240,17 @@
    CSS_Validator_URI =
      http://jigsaw.w3.org/css-validator/validator?uri=%s
 
-C<Doc_URI> and C<Style_URI> are URIs used for linking to the documentation
-and style sheet from the dynamically generated content of the link checker.
-The defaults are:
+C<Doc_URI> is a URI used for linking to the documentation, and CSS and
+JavaScript files in the dynamically generated content of the link checker.
+The default is:
 
    Doc_URI = http://validator.w3.org/docs/checklink.html
-   Style_URI = http://validator.w3.org/docs/linkchecker.css
+
+C<Connection_Cache_Size> is an integer denoting the maximum number of
+connections the link checker will keep open at any given time.  The
+default is:
+
+   Connection_Cache_Size = 2
 
 =back
 

Modified: trunk/w3c-linkchecker/debian/changelog
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/w3c-linkchecker/debian/changelog?rev=72213&op=diff
==============================================================================
--- trunk/w3c-linkchecker/debian/changelog (original)
+++ trunk/w3c-linkchecker/debian/changelog Sun Apr  3 21:54:29 2011
@@ -1,3 +1,11 @@
+w3c-linkchecker (4.8-1) UNRELEASED; urgency=low
+
+  TODO: check javascript file carefully
+
+  * New upstream release
+
+ -- Nicholas Bamber <nicholas at periapt.co.uk>  Sun, 03 Apr 2011 22:54:51 +0100
+
 w3c-linkchecker (4.7-1) unstable; urgency=low
 
   [ gregor herrmann ]

Modified: trunk/w3c-linkchecker/etc/checklink.conf
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/w3c-linkchecker/etc/checklink.conf?rev=72213&op=diff
==============================================================================
--- trunk/w3c-linkchecker/etc/checklink.conf (original)
+++ trunk/w3c-linkchecker/etc/checklink.conf Sun Apr  3 21:54:29 2011
@@ -44,8 +44,10 @@
 #
 # Doc_URI is the URI to the Link Checker documentation, shown in the
 # results report in CGI mode, and the usage message in command line mode.
-# If you have installed the documentation locally somewhere, you may wish to
-# change this to point to that version.  This must be an absolute URI.
+# The URIs to the CSS and JavaScript files in the generated HTML are also
+# formed using this as their base URI.  If you have installed the documentation
+# locally somewhere, you may wish to change this to point to that location.
+# This must be an absolute URI.
 #
 # Default:
 # Doc_URI = http://validator.w3.org/docs/checklink.html
@@ -59,3 +61,11 @@
 #
 # Default:
 # Forbidden_Protocols = javascript,mailto
+
+
+#
+# Connection_Cache_Size is an integer denoting the maximum number of
+# connections the link checker will keep open at any given time.
+#
+# Default:
+# Connection_Cache_Size = 2

Modified: trunk/w3c-linkchecker/lib/W3C/LinkChecker.pm
URL: http://svn.debian.org/wsvn/pkg-perl/trunk/w3c-linkchecker/lib/W3C/LinkChecker.pm?rev=72213&op=diff
==============================================================================
--- trunk/w3c-linkchecker/lib/W3C/LinkChecker.pm (original)
+++ trunk/w3c-linkchecker/lib/W3C/LinkChecker.pm Sun Apr  3 21:54:29 2011
@@ -2,5 +2,5 @@
 package W3C::LinkChecker;
 use strict;
 use vars qw($VERSION);
-$VERSION = "4.7";
+$VERSION = "4.8";
 1;




More information about the Pkg-perl-cvs-commits mailing list