r5561 - in /packages/libtext-csv-perl/trunk: CSV_XS.pm CSV_XS.xs ChangeLog MANIFEST META.yml README debian/changelog debian/control t/45_eol.t

eloy at users.alioth.debian.org eloy at users.alioth.debian.org
Fri Jun 1 14:48:56 UTC 2007


Author: eloy
Date: Fri Jun  1 14:48:56 2007
New Revision: 5561

URL: http://svn.debian.org/wsvn/pkg-perl/?sc=1&rev=5561
Log:
new upstream version

Added:
    packages/libtext-csv-perl/trunk/t/45_eol.t
      - copied unchanged from r5560, packages/libtext-csv-perl/branches/upstream/current/t/45_eol.t
Modified:
    packages/libtext-csv-perl/trunk/CSV_XS.pm
    packages/libtext-csv-perl/trunk/CSV_XS.xs
    packages/libtext-csv-perl/trunk/ChangeLog
    packages/libtext-csv-perl/trunk/MANIFEST
    packages/libtext-csv-perl/trunk/META.yml
    packages/libtext-csv-perl/trunk/README
    packages/libtext-csv-perl/trunk/debian/changelog
    packages/libtext-csv-perl/trunk/debian/control

Modified: packages/libtext-csv-perl/trunk/CSV_XS.pm
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/CSV_XS.pm?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/CSV_XS.pm (original)
+++ packages/libtext-csv-perl/trunk/CSV_XS.pm Fri Jun  1 14:48:56 2007
@@ -28,7 +28,7 @@
 use DynaLoader ();
 
 use vars   qw( $VERSION @ISA );
-$VERSION = "0.26";
+$VERSION = "0.27";
 @ISA     = qw( DynaLoader );
 
 sub PV () { 0 }
@@ -191,7 +191,7 @@
 {
     my ($self, $idx, $val) = @_;
     ref $self->{_FFLAGS} &&
-	$idx >= 0 && $idx < @{$self->{_FFLAGS}} or return undef;
+	$idx >= 0 && $idx < @{$self->{_FFLAGS}} or return;
     $self->{_FFLAGS}[$idx] & 0x0001 ? 1 : 0;
     } # is_quoted
 
@@ -199,7 +199,7 @@
 {
     my ($self, $idx, $val) = @_;
     ref $self->{_FFLAGS} &&
-	$idx >= 0 && $idx < @{$self->{_FFLAGS}} or return undef;
+	$idx >= 0 && $idx < @{$self->{_FFLAGS}} or return;
     $self->{_FFLAGS}[$idx] & 0x0002 ? 1 : 0;
     } # is_binary
 
@@ -317,6 +317,43 @@
 comma-separated values.  An instance of the Text::CSV_XS class can combine
 fields into a CSV string and parse a CSV string into fields.
 
+The module accepts either strings or files as input and can utilize any
+user-specified characters as delimiters, separators, and escapes so it is
+perhaps better called ASV (anything separated values) rather than just CSV.
+
+=head2 Embedded newlines
+
+B<Important Note>: The default behaviour is to only accept ascii characters.
+This means that fields can not contain newlines. If your data contains 
+newlines embedded in fields, or characters above 0x7e (tilde), or binary data,
+you *must* set C<binary => 1> in the call to C<new ()>.  To cover the widest
+range of parsing options, you will always want to set binary.
+
+But you still have the problem that you have to pass a correct line to the
+C<parse ()> method, which is more complicated from the usual point of
+usage:
+
+ my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
+ while (<>) {
+     $csv->parse ($_);
+     my @fields = $csv->fields ();
+
+will break, as the while might read broken lines, as that doesn't care
+about the quoting. If you need to support embedd newlines, the way to go
+is either
+
+ use IO::Handle;
+ my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
+ while (my $row = $csv->getline (*ARGV)) {
+     my @fields = @$row;
+
+or, more safely in perl 5.6 and up
+
+ my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
+ open my $io, "<", $file or die "$file: $!";
+ while (my $row = $csv->getline ($io)) {
+     my @fields = @$row;
+ 
 =head1 FUNCTIONS
 
 =over 4
@@ -347,10 +384,25 @@
 default), C<"\012"> (Line Feed) or C<"\015\012"> (Carriage Return,
 Line Feed)
 
+If both C<$/> and C<eol> equal C<"\015">, parsing lines that end on
+only a Carriage Return without Line Feed, will be C<parse>d correct.
+Line endings, wheather in C<$/> or C<eol>, other than C<undef>,
+C<"\n">, C<"\r\n">, or C<"\r"> are not (yet) supported for parsing.
+
 =item escape_char
 
-The char used for escaping certain characters inside quoted fields,
-by default the same character. (C<">)
+The character used for escaping certain characters inside quoted fields.
+
+The C<escape_char> defaults to being the literal double-quote mark (C<">)
+in other words, the same as the default C<quote_char>. This means that
+doubling the quote mark in a field escapes it:
+
+  "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
+
+If you change the default quote_char without changing the default
+escape_char, the escape_char will still be the quote mark.  If instead 
+you want to escape the quote_char by doubling it, you will need to change
+the escape_char to be the same as what you changed the quote_char to.
 
 The escape character can not be equal to the separation character.
 
@@ -440,7 +492,7 @@
 to the I<$io> object, typically an IO handle or any other object that
 offers a I<print> method. Note, this implies that the following is wrong:
 
- open FILE, ">whatever";
+ open FILE, ">", "whatever";
  $status = $csv->print (\*FILE, $colref);
 
 The glob C<\*FILE> is not an object, thus it doesn't have a print
@@ -693,11 +745,17 @@
 
 =head1 TODO
 
+=over 2
+
+=item eol
+
 Discuss an option to make the eol honor the $/ setting. Maybe
 
   my $csv = Text::CSV_XS->new ({ eol => $/ });
 
 is already enough, and new options only make things less opaque.
+
+=item setting meta info
 
 Future extensions might include extending the C<fields_flags ()>,
 C<is_quoted ()>, and C<is_binary ()> to accept setting these flags
@@ -707,6 +765,8 @@
   $csv->meta_info (0, 1, 1, 3, 0, 0);
   $csv->is_quoted (3, 1);
 
+=item parse returning undefined fields
+
 Adding an option that enables the parser to distinguish between
 empty fields and undefined fields, like
 
@@ -718,6 +778,55 @@
 Then would return (undef, "", "1", "2", undef, "") in @fld, instead
 of the current ("", "", "1", "2", "", "").
 
+=item combined methods
+
+Adding means (methods) that combine C<combine ()> and C<string ()> in
+a single call. Likewise for C<parse ()> and C<fields ()>. Given the
+trouble with embedded newlines, maybe just allowing C<getline ()> and
+C<print ()> is sufficient.
+
+=item Unicode
+
+Make C<parse ()> and C<combine ()> do the right thing for Unicode
+(UTF-8) if requested. See t/50_utf8.t. More complicated, but evenly
+important, also for C<getline ()> and C<print ()>.
+
+=item Space delimited seperators
+
+Discuss if and how C<Text::CSV_XS> should/could support formats like
+
+   1 , "foo" , "bar" , 3.19 ,
+
+=item Double double quotes
+
+There seem to be applications around that write their dates like
+
+   1,4,""12/11/2004"",4,1
+
+If we would support that, in what way?
+
+=item Parse the whole file at once
+
+Implement a new methods that enables the parsing of a complete file
+at once, returning a lis of hashes. Possible extension to this could
+be to enable a column selection on the call:
+
+   my @AoH = $csv->parse_file ($filename, { cols => [ 1, 4..8, 12 ]});
+
+Returning something like
+
+   [ { fields => [ 1, 2, "foo", 4.5, undef, "", 8 ],
+       flags  => [ ... ],
+       errors => [ ... ],
+       },
+     { fields => [ ... ],
+       .
+       .
+       },
+     ]
+
+=back
+
 =head1 SEE ALSO
 
 L<perl(1)>, L<IO::File(3)>, L<IO::Wrap(3)>, L<Spreadsheet::Read(3)>
@@ -732,10 +841,13 @@
 Jochen Wiedmann F<E<lt>joe at ispsoft.deE<gt>> rewrote the encoding and
 decoding in C by implementing a simple finite-state machine and added
 the variable quote, escape and separator characters, the binary mode
-and the print and getline methods.
-
-H.Merijn Brand F<E<lt>h.m.brand at xs4all.nlE<gt>> cleaned up the code
-and added the field flags methods.
+and the print and getline methods. See ChangeLog releases 0.10 through
+0.23.
+
+H.Merijn Brand F<E<lt>h.m.brand at xs4all.nlE<gt>> cleaned up the code,
+added the field flags methods, wrote the major part of the test suite,
+completed the documentation, fixed some RT bugs. See ChangeLog releases
+0.25 and on.
 
 =head1 COPYRIGHT AND LICENSE
 

Modified: packages/libtext-csv-perl/trunk/CSV_XS.xs
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/CSV_XS.xs?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/CSV_XS.xs (original)
+++ packages/libtext-csv-perl/trunk/CSV_XS.xs Fri Jun  1 14:48:56 2007
@@ -9,9 +9,11 @@
 #include <XSUB.h>
 #include "ppport.h"
 
-#define CSV_XS_TYPE_PV 0
-#define CSV_XS_TYPE_IV 1
-#define CSV_XS_TYPE_NV 2
+#define MAINT_DEBUG	0
+
+#define CSV_XS_TYPE_PV	0
+#define CSV_XS_TYPE_IV	1
+#define CSV_XS_TYPE_NV	2
 
 #define CSV_FLAGS_QUO	0x0001
 #define CSV_FLAGS_BIN	0x0002
@@ -41,6 +43,9 @@
     SV		*tmp;
     char	*types;
     STRLEN	 types_len;
+    char	*eol;
+    STRLEN	 eol_len;
+    int		 eol_is_cr;
     } csv_t;
 
 #define bool_opt(o) \
@@ -81,6 +86,15 @@
 	STRLEN len;
 	csv->types = SvPV (*svp, len);
 	csv->types_len = len;
+	}
+    csv->eol = NULL;
+    csv->eol_is_cr = 0;
+    if ((svp = hv_fetch (self, "eol",      3, 0)) && *svp && SvOK (*svp)) {
+	STRLEN len;
+	csv->eol = SvPV (*svp, len);
+	csv->eol_len = len;
+	if (len == 1 && *csv->eol == '\015')
+	    csv->eol_is_cr = 1;
 	}
 
     csv->binary		= bool_opt ("binary");
@@ -206,9 +220,12 @@
     return TRUE;
     } /* Combine */
 
-static void ParseError (csv_t *csv)
+static void ParseError (csv_t *csv, int ln)
 {
     if (csv->tmp) {
+#if MAINT_DEBUG
+	fprintf (stderr, "# Parse error on line %d: '%s'\n", ln, csv->tmp);
+#endif
 	if (hv_store (csv->self, "_ERROR_INPUT", 12, csv->tmp, 0))
 	    SvREFCNT_inc (csv->tmp);
 	}
@@ -242,12 +259,12 @@
 
 #define ERROR_INSIDE_QUOTES {			\
     SvREFCNT_dec (insideQuotes);		\
-    ParseError (csv);				\
+    ParseError (csv, __LINE__);			\
     return FALSE;				\
     }
 #define ERROR_INSIDE_FIELD {			\
     SvREFCNT_dec (insideField);			\
-    ParseError (csv);				\
+    ParseError (csv, __LINE__);			\
     return FALSE;				\
     }
 
@@ -306,7 +323,7 @@
 		}
 	    }
 	else
-	if (c == '\012') {
+	if (c == '\012') { /* \n */
 	    if (waitingForField) {
 		av_push (fields, newSVpv ("", 0));
 		if (csv->flags)
@@ -327,9 +344,16 @@
 		}
 	    }
 	else
-	if (c == '\015') {
+	if (c == '\015') { /* \r */
 	    if (waitingForField) {
-		int	c2 = CSV_GET;
+		int	c2;
+
+		if (csv->eol_is_cr) {
+		    c = '\012';
+		    goto restart;
+		    }
+
+		c2 = CSV_GET;
 
 		if (c2 == EOF) {
 		    insideField = newSVpv ("", 0);
@@ -356,7 +380,14 @@
 		CSV_PUT_SV (insideQuotes, c);
 		}
 	    else {
-		int	c2 = CSV_GET;
+		int	c2;
+
+		if (csv->eol_is_cr) {
+		    AV_PUSH (insideField);
+		    return TRUE;
+		    }
+
+		c2 = CSV_GET;
 
 		if (c2 == '\012') {
 		    AV_PUSH (insideField);
@@ -390,19 +421,23 @@
 			return TRUE;
 
 		    if (c2 == '\015') {
-			int	c3 = CSV_GET;
-
+			int	c3;
+
+			if (csv->eol_is_cr)
+			    return TRUE;
+
+			c3 = CSV_GET;
 			if (c3 == '\012')
 			    return TRUE;
 
-			ParseError (csv);
+			ParseError (csv, __LINE__);
 			return FALSE;
 			}
 
 		    if (c2 == '\012')
 			return TRUE;
 
-		    ParseError (csv);
+		    ParseError (csv, __LINE__);
 		    return FALSE;
 		    }
 
@@ -431,7 +466,14 @@
 
 		else {
 		    if (c2 == '\015') {
-			int	c3 = CSV_GET;
+			int	c3;
+
+			if (csv->eol_is_cr) {
+			    AV_PUSH (insideQuotes);
+			    return TRUE;
+			    }
+
+			c3 = CSV_GET;
 
 			if (c3 == '\012') {
 			    AV_PUSH (insideQuotes);

Modified: packages/libtext-csv-perl/trunk/ChangeLog
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/ChangeLog?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/ChangeLog (original)
+++ packages/libtext-csv-perl/trunk/ChangeLog Fri Jun  1 14:48:56 2007
@@ -1,3 +1,17 @@
+2007-05-31  0.27 - H.Merijn Brand   <h.m.brand at xs4all.nl>
+
+	* checked with perlcritic (still works under 5.00504)
+	  so 3-arg open cannot be used (except in the docs)
+	* 3-arg open in docs too
+	* Added a lot to the TODO list
+	* Some more info on using escape character (jZed)
+	* Mention Text::CSV_PP in README
+	* Added t/45_eol.t, eol tests
+	* Added a section about embedded newlines in the pod
+	* Allow \r as eol ($/) for parsing
+	* More docs for eol
+	* More eol = \r fixes, tfrayner's test case added to t/45_eol.t
+
 2007-05-15  0.26 - H.Merijn Brand   <h.m.brand at xs4all.nl>
 
 	* Add $csv->allow_undef (1) suggestion in TODO

Modified: packages/libtext-csv-perl/trunk/MANIFEST
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/MANIFEST?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/MANIFEST (original)
+++ packages/libtext-csv-perl/trunk/MANIFEST Fri Jun  1 14:48:56 2007
@@ -13,6 +13,7 @@
 t/20_file.t		IO tests (print and getline)
 t/30_types.t		Tests for the "types" attribute.
 t/40_misc.t		Binary mode tests
+t/45_eol.t		Embedded EOL
 t/50_utf8.t		Unicode stress tests
 t/55_combi.t		Different CSV character combinations
 t/60_samples.t		Miscellaneous problems from the modules history.

Modified: packages/libtext-csv-perl/trunk/META.yml
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/META.yml?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/META.yml (original)
+++ packages/libtext-csv-perl/trunk/META.yml Fri Jun  1 14:48:56 2007
@@ -1,6 +1,6 @@
 --- #YAML:1.0
 name:                Text-CSV_XS
-version:             0.26
+version:             0.27
 abstract:            Comma-Separated Values manipulation routines
 license:             perl
 generated_by:        ExtUtils::MakeMaker version 6.32

Modified: packages/libtext-csv-perl/trunk/README
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/README?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/README (original)
+++ packages/libtext-csv-perl/trunk/README Fri Jun  1 14:48:56 2007
@@ -28,3 +28,6 @@
     Jochen Wiedmann <joe at ispsoft.de>
 
     Interface design by Alan Citterman <alan at mfgrtl.com>
+
+    A pure-perl version is being maintained by Makamaka Hannyaharamitu
+    as Text::CSV_PP, which tries to follow Text::CSV_XS very closely.

Modified: packages/libtext-csv-perl/trunk/debian/changelog
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/debian/changelog?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/debian/changelog (original)
+++ packages/libtext-csv-perl/trunk/debian/changelog Fri Jun  1 14:48:56 2007
@@ -1,8 +1,9 @@
-libtext-csv-perl (0.26-2) UNRELEASED; urgency=low
+libtext-csv-perl (0.27-1) unstable; urgency=low
 
-  * NOT RELEASED YET
+  * New upstream release
+  * debian/control: added me to Uploaders
 
- -- Damyan Ivanov <dmn at debian.org>  Tue, 22 May 2007 12:20:32 +0300
+ -- Krzysztof Krzyzaniak (eloy) <eloy at debian.org>  Fri, 01 Jun 2007 16:47:47 +0200
 
 libtext-csv-perl (0.26-1) unstable; urgency=low
 

Modified: packages/libtext-csv-perl/trunk/debian/control
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/debian/control?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/debian/control (original)
+++ packages/libtext-csv-perl/trunk/debian/control Fri Jun  1 14:48:56 2007
@@ -1,6 +1,6 @@
 Source: libtext-csv-perl
 Maintainer: Debian Perl Group <pkg-perl-maintainers at lists.alioth.debian.org>
-Uploaders: Gunnar Wolf <gwolf at debian.org>, Niko Tyni <ntyni at iki.fi>, gregor herrmann <gregor+debian at comodo.priv.at>
+Uploaders: Gunnar Wolf <gwolf at debian.org>, Niko Tyni <ntyni at iki.fi>, gregor herrmann <gregor+debian at comodo.priv.at>, Krzysztof Krzyzaniak (eloy) <eloy at debian.org>
 Section: perl
 Priority: optional
 Standards-Version: 3.7.2




More information about the Pkg-perl-cvs-commits mailing list