Bug#733111: update uscan Files-Excluded parsing to support escaped names

Nicolas Boulenguez nicolas at debian.org
Mon Feb 3 23:19:32 UTC 2014


Package: devscripts
Followup-For: Bug #733111

Hello.  I believe that this bug should be reopened.

Having described \* \\ and \?, [1] states that "Any other character
following a backslash is an error.". My understanding is:
- The 'foo\ bar' glob discussed in this bug report is illegal.
- It is not needed because 'foo?bar' matches 'foo bar'.
If so, the trick should be added to #688481, and splitting globs with
\s+ is correct.

The current implementation removes all trailing / from each glob. This
contradicts [1]: 'foo/' is a valid glob, though it will never match
anything. This is the point of #688481.

The current implementation relies on 'find' to interpret the pattern,
making the brackets metacharacters. [1] explicitely states that "[]
wildcards are not recognized".

I suggest to use something inspired from the perl code below. It
should also be more efficient, being equivalent to a single 'find'
invokation instead of (2+number of globs).

The behaviour is different when the "foo" pattern matches a non empty
directory (non empty after depth-traversal). There is no practical
need for a separate handling, as 'foo*' removes the whole tree in a
more consistant and readable way.  Maybe the trick should also be
described in #688481.

Do I miss something?



# preamble (maybe with another logging system)
use File::Find;
use Log::Log4perl qw(:easy);
Log::Log4perl->easy_init ($WARN);
my $logger = get_logger;


# Replacement suggestion for lines 1541 to 1558 of [2]

my @regexs = ();
for my $glob (split (/\s+/, $data->{"files-excluded"})) {
    # Complain of any misplaced backslash.
    if ($glob =~ m/(?<!\\)\\(?![\\*?])/) {
       $logger->logdie ("$copyright_file: \\ not followed by *?\\ in pattern: \"$glob\"");
    }
    my $regex = $glob;
    # Escape potential Perl meta characters except unescaped * \ ?.
    $regex =~ s/(?<!\\)([^A-Za-z_0-9\*\?\\])/\\$1/g;
    # Translate unescaped * \ ? into Perl equivalents.
    $regex =~ s/(?<!\\)\*/\.\*/g;
    $regex =~ s/(?<!\\)\?/\./g;
    # Anchor the regex with the escaped top directory path.
    $regex = qr/^\Q$main_source_dir\E\/$regex$/s;
    $logger->debug ("translated glob \"$glob\" to regex \"$regex\"");
    push (@regexs, $regex);
}
my $something_actually_excluded = 0;
File::Find::finddepth (sub {
    for my $regex (@regexs) {
        if ($File::Find::name ~ m/$regex/) {
            if (-d $File::Find::name) {
                if (rmdir $File::Find::name) {
                    $something_actually_excluded = 1;
                    $logger->debug ("rmdir \"$File::Find::name\"");
                } elsif ($!{ENOTEMPTY} or $!{EEXIST}) {
                    $logger->logdie ("Cannot exclude non empty directory \"$File::Find::name\". Use a glob ending with * to remove a tree.");
                } else {
                    $logger->error ("rmdir \"$File::Find::name\": $!");
                }
            } elsif (unlink $File::Find::name) {
                $something_actually_excluded = 1;
                $logger->debug ("unlink \"$File::Find::name\"");
            } else {
                $logger->error ("unlink \"$File::Find::name\": $!");
            }
            last;
        }
    }
}, $main_source_dir);
if (! $something_actually_excluded) {


[1] http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/#files-field
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/devscripts.git;a=blob;f=scripts/uscan.pl;h=7314a5c0a567f5639f91c6126cd44780a48df92d;hb=69329b529522b2e84ea2c2e20d7e6e4d72e13c75



More information about the devscripts-devel mailing list