[Shootout-list] regex bug
skaller
skaller@users.sourceforge.net
11 Nov 2004 02:33:00 +1100
Might I point out the result data for this test is incorrect,
and all languages passing the test are wrong.
The phone number corresponding to this line:
foo (213 222-2222 bar
which is (213) 222-2222 formatted, in fact meets the
test specifications.
The Felix version uses this specification which is
manifestly correct:
regexp digit = ["0123456789"];
regexp digits3 = digit digit digit;
regexp digits4 = digits3 digit;
regexp area_code = digits3 | "(" digits3 ")";
regexp exchange = digits3;
regexp phone = area_code " " exchange (" " | "-") digits4;
On the other hand the pcre regexp is wrong, but it is
very hard to see the error:
"(?:^|[^\\d\\(])" /* must be preceeded by non-digit */
"(\\()?" /* match 1: possible initial left paren */
"(\\d\\d\\d)" /* match 2: area code is 3 digits */
"(?(1)\\))" /* if match1 then match right paren */
"[ ]" /* area code followed by one space */
"(\\d\\d\\d)" /* match 3: prefix of 3 digits */
"[ -]" /* separator is either space or dash */
"(\\d\\d\\d\\d)" /* match 4: last 4 digits */
"\\D" /* must be followed by a non-digit */
;
This match uses a non-regular feature, conditionally matching
the right bracket if a left bracket was found.
It picks up the left bracket in this line:
foo (213 222-2222 bar
but fails to find a matching right bracket, and
so fails the line. But the line matches: that (
is a non-digit preceding a phone number, as required.
The correct regexp is shown in the Felix code,
the area code is like this:
(999) | 999
I have no idea how to do the capture (captures are not
well defined anyhow .. and pcre doesn't do what it says
either .. but that's another story ..)
--
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850,
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net