Bug#733111: update uscan Files-Excluded parsing to support escaped names
Nicolas Boulenguez
nicolas at debian.org
Mon Feb 3 23:19:32 UTC 2014
Package: devscripts
Followup-For: Bug #733111
Hello. I believe that this bug should be reopened.
Having described \* \\ and \?, [1] states that "Any other character
following a backslash is an error.". My understanding is:
- The 'foo\ bar' glob discussed in this bug report is illegal.
- It is not needed because 'foo?bar' matches 'foo bar'.
If so, the trick should be added to #688481, and splitting globs with
\s+ is correct.
The current implementation removes all trailing / from each glob. This
contradicts [1]: 'foo/' is a valid glob, though it will never match
anything. This is the point of #688481.
The current implementation relies on 'find' to interpret the pattern,
making the brackets metacharacters. [1] explicitely states that "[]
wildcards are not recognized".
I suggest to use something inspired from the perl code below. It
should also be more efficient, being equivalent to a single 'find'
invokation instead of (2+number of globs).
The behaviour is different when the "foo" pattern matches a non empty
directory (non empty after depth-traversal). There is no practical
need for a separate handling, as 'foo*' removes the whole tree in a
more consistant and readable way. Maybe the trick should also be
described in #688481.
Do I miss something?
# preamble (maybe with another logging system)
use File::Find;
use Log::Log4perl qw(:easy);
Log::Log4perl->easy_init ($WARN);
my $logger = get_logger;
# Replacement suggestion for lines 1541 to 1558 of [2]
my @regexs = ();
for my $glob (split (/\s+/, $data->{"files-excluded"})) {
# Complain of any misplaced backslash.
if ($glob =~ m/(?<!\\)\\(?![\\*?])/) {
$logger->logdie ("$copyright_file: \\ not followed by *?\\ in pattern: \"$glob\"");
}
my $regex = $glob;
# Escape potential Perl meta characters except unescaped * \ ?.
$regex =~ s/(?<!\\)([^A-Za-z_0-9\*\?\\])/\\$1/g;
# Translate unescaped * \ ? into Perl equivalents.
$regex =~ s/(?<!\\)\*/\.\*/g;
$regex =~ s/(?<!\\)\?/\./g;
# Anchor the regex with the escaped top directory path.
$regex = qr/^\Q$main_source_dir\E\/$regex$/s;
$logger->debug ("translated glob \"$glob\" to regex \"$regex\"");
push (@regexs, $regex);
}
my $something_actually_excluded = 0;
File::Find::finddepth (sub {
for my $regex (@regexs) {
if ($File::Find::name ~ m/$regex/) {
if (-d $File::Find::name) {
if (rmdir $File::Find::name) {
$something_actually_excluded = 1;
$logger->debug ("rmdir \"$File::Find::name\"");
} elsif ($!{ENOTEMPTY} or $!{EEXIST}) {
$logger->logdie ("Cannot exclude non empty directory \"$File::Find::name\". Use a glob ending with * to remove a tree.");
} else {
$logger->error ("rmdir \"$File::Find::name\": $!");
}
} elsif (unlink $File::Find::name) {
$something_actually_excluded = 1;
$logger->debug ("unlink \"$File::Find::name\"");
} else {
$logger->error ("unlink \"$File::Find::name\": $!");
}
last;
}
}
}, $main_source_dir);
if (! $something_actually_excluded) {
[1] http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/#files-field
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/devscripts.git;a=blob;f=scripts/uscan.pl;h=7314a5c0a567f5639f91c6126cd44780a48df92d;hb=69329b529522b2e84ea2c2e20d7e6e4d72e13c75
More information about the devscripts-devel
mailing list