[Shootout-list] regex bug

skaller skaller@users.sourceforge.net
11 Nov 2004 02:33:00 +1100


Might I point out the result data for this test is incorrect,
and all languages passing the test are wrong.

The phone number corresponding to this line:

foo (213 222-2222 bar

which is (213) 222-2222 formatted, in fact meets the
test specifications.

The Felix version uses this specification which is
manifestly correct:

regexp digit = ["0123456789"];
regexp digits3 = digit digit digit;
regexp digits4 =  digits3 digit;

regexp area_code = digits3 | "(" digits3 ")";
regexp exchange = digits3;

regexp phone = area_code " " exchange (" " | "-") digits4;

On the other hand the pcre regexp is wrong, but it is
very hard to see the error:

"(?:^|[^\\d\\(])"		/* must be preceeded by non-digit */
"(\\()?"			/* match 1: possible initial left paren */
"(\\d\\d\\d)"			/* match 2: area code is 3 digits */
"(?(1)\\))"			/* if match1 then match right paren */
"[ ]"				/* area code followed by one space */
"(\\d\\d\\d)"			/* match 3: prefix of 3 digits */
"[ -]"				/* separator is either space or dash */
"(\\d\\d\\d\\d)"		/* match 4: last 4 digits */
"\\D"				/* must be followed by a non-digit */
;

This match uses a non-regular feature, conditionally matching
the right bracket if a left bracket was found.

It picks up the left bracket in this line:

foo (213 222-2222 bar

but fails to find a matching right bracket, and
so fails the line. But the line matches: that (
is a non-digit preceding a phone number, as required.

The correct regexp is shown in the Felix code,
the area code is like this:

	(999) | 999

I have no idea how to do the capture (captures are not
well defined anyhow .. and pcre doesn't do what it says
either .. but that's another story ..)


-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net