[Po4a-devel]Two small issues with Po.pm
Nicolas François
nicolas.francois@centraliens.net
Sat, 11 Dec 2004 01:54:39 +0100
--pf9I7BMVVzbSWLtt
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Hello,
One of the patches was uploaded. The other one (\n preceded by an even
number of \) is more complicated.
This issue appears in unescape_text (\\n is not tranlated to an end of
line, which causes formatting issues) and in quote_text (in no-wrap mode,
no newline is added after a \\n, but I think it has no implication on the
output document).
There may also be the same issue in unquote_text, but I could not find a
way to trigger it.
My solution is maybe too complicated:
+ # unescape newlines
+ # NOTE on \G:
+ # The following regular expression introduce newlines.
+ # Thus, ^ doesn't match all beginnings of lines.
+ # \G is a zero-width assertion that matches the position
+ # of the previous substitution with s///g. As every
+ # substitution ends by a newline, it always matches a
+ # position just after a newline.
+ $text =~ s/( # $1:
+ (\G|[^\\]) # beginning of the line or any char
+ # different from '\'
+ (\\\\)* # followed by any even number of '\'
+ )\\n # and followed by an escaped newline
+ /$1\n/sgx; # single string, match globally, allow comments
+ # unescape tabulations
+ $text =~ s/( # $1:
+ (^|[^\\]) # beginning of the line or any char
+ # different from '\'
+ (\\\\)* # followed by any even number of '\'
+ )\\t # and followed by an escaped tabulation
+ /$1\t/mgx; # multilines string, match globally, allow comments
Do you have another idea to avoid using (and documenting) \G?
In canonize, I don't thing it is needed to protect \n against a preceding \,
and I'm wondering if
$text =~ s/([^\\])\n/$1 /gm;
$text =~ s/ \n/ /gm;
$text =~ s/([^\\])\n/$1 /gm;
should not simply be changed to:
$text =~ s/\n/ /gm if ($text ne "\n");
(do not do anything if ($text eq "\n"), because it mess up the first
string - the header)
Regards,
--
Nekral
--pf9I7BMVVzbSWLtt
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="Po.pm.escape.patch"
Index: lib/Locale/Po4a/Po.pm
===================================================================
RCS file: /cvsroot/po4a/po4a/lib/Locale/Po4a/Po.pm,v
retrieving revision 1.31
diff -a -u -p -r1.31 Po.pm
--- lib/Locale/Po4a/Po.pm 5 Dec 2004 19:24:29 -0000 1.31
+++ lib/Locale/Po4a/Po.pm 11 Dec 2004 00:50:21 -0000
@@ -972,9 +972,28 @@ sub unescape_text {
print STDERR "\nunescape [$text]====" if $debug{'escape'};
$text = join("",split(/\n/,$text));
$text =~ s/\\"/"/g;
- $text =~ s/([^\\])\\n/$1\n/g;
- $text =~ s/^\\n/\n/mg;
- $text =~ s/([^\\])\\t/$1\t/g;
+ # unescape newlines
+ # NOTE on \G:
+ # The following regular expression introduce newlines.
+ # Thus, ^ doesn't match all beginnings of lines.
+ # \G is a zero-width assertion that matches the position
+ # of the previous substitution with s///g. As every
+ # substitution ends by a newline, it always matches a
+ # position just after a newline.
+ $text =~ s/( # $1:
+ (\G|[^\\]) # beginning of the line or any char
+ # different from '\'
+ (\\\\)* # followed by any even number of '\'
+ )\\n # and followed by an escaped newline
+ /$1\n/sgx; # single string, match globally, allow comments
+ # unescape tabulations
+ $text =~ s/( # $1:
+ (^|[^\\]) # beginning of the line or any char
+ # different from '\'
+ (\\\\)* # followed by any even number of '\'
+ )\\t # and followed by an escaped tabulation
+ /$1\t/mgx; # multilines string, match globally, allow comments
+ # and unescape the escape character
$text =~ s/\\\\/\\/g;
print STDERR ">$text<\n" if $debug{'escape'};
@@ -1004,8 +1023,14 @@ sub quote_text {
return '""' unless defined($string) && length($string);
print STDERR "\nquote [$string]====" if $debug{'quote'};
- $string =~ s/([^\\])\\n/$1!!DUMMYPOPM!!/gm;
- $string =~ s|!!DUMMYPOPM!!|\\n\n|gm;
+ # break lines on newlines, if any
+ # see unescape_text for an explanation on \G
+ $string =~ s/( # $1:
+ (\G|[^\\]) # beginning of the line or any char
+ # different from '\'
+ (\\\\)* # followed by any even number of '\'
+ )\\n # and followed by an escaped newline
+ /$1\n/sgx; # single string, match globally, allow comments
$string = wrap($string);
my @string = split(/\n/,$string);
$string = join ("\"\n\"",@string);
@@ -1025,6 +1050,8 @@ sub unquote_text {
$string =~ s/^""\\n//s;
$string =~ s/^"(.*)"$/$1/s;
$string =~ s/"\n"//gm;
+ # Note: an even number of '\' could precede \\n, but I could not build a
+ # document to test this
$string =~ s/([^\\])\\n\n/$1!!DUMMYPOPM!!/gm;
$string =~ s|!!DUMMYPOPM!!|\\n|gm;
print STDERR ">$string<\n" if $debug{'quote'};
@@ -1032,15 +1059,20 @@ sub unquote_text {
}
# canonize the string: write it on only one line, changing consecutive whitespace to
-# only on space.
+# only one space.
# Warning, it changes the string and should only be called if the string is plain text
sub canonize {
my $text=shift;
print STDERR "\ncanonize [$text]====" if $debug{'canonize'};
$text =~ s/^ *//s;
- $text =~ s/([^\\])\n/$1 /gm;
- $text =~ s/ \n/ /gm;
- $text =~ s/([^\\])\n/$1 /gm;
+ # What about lines starting by a newline ?
+ # FIXME: needed here ?
+# $text =~ s/([^\\])\n/$1 /gm;
+# $text =~ s/ \n/ /gm;
+# $text =~ s/([^\\])\n/$1 /gm;
+ # FIXME: I rather like only this:
+ # if ($text eq "\n"), it messed up the first string (header)
+ $text =~ s/\n/ /gm if ($text ne "\n");
$text =~ s/([.)]) +/$1 /gm;
$text =~ s/([^.)]) */$1 /gm;
$text =~ s/ *$//s;
--pf9I7BMVVzbSWLtt--