r5466 - in /packages/libmarc-charset-perl/trunk: Changes MANIFEST META.yml README debian/changelog debian/control lib/MARC/Charset.pm lib/MARC/Charset/Constants.pm t/entities.t t/escape1.t t/escape2.t t/marc8_to_utf8.t t/utf8.t

gregoa-guest at users.alioth.debian.org gregoa-guest at users.alioth.debian.org
Fri May 18 22:53:45 UTC 2007


Author: gregoa-guest
Date: Fri May 18 22:53:45 2007
New Revision: 5466

URL: http://svn.debian.org/wsvn/pkg-perl/?sc=1&rev=5466
Log:
* New upstream release.
* Set Standards-Version to 3.7.2 (no changes).

Added:
    packages/libmarc-charset-perl/trunk/t/marc8_to_utf8.t
      - copied unchanged from r5465, packages/libmarc-charset-perl/branches/upstream/current/t/marc8_to_utf8.t
Removed:
    packages/libmarc-charset-perl/trunk/t/entities.t
Modified:
    packages/libmarc-charset-perl/trunk/Changes
    packages/libmarc-charset-perl/trunk/MANIFEST
    packages/libmarc-charset-perl/trunk/META.yml
    packages/libmarc-charset-perl/trunk/README
    packages/libmarc-charset-perl/trunk/debian/changelog
    packages/libmarc-charset-perl/trunk/debian/control
    packages/libmarc-charset-perl/trunk/lib/MARC/Charset.pm
    packages/libmarc-charset-perl/trunk/lib/MARC/Charset/Constants.pm
    packages/libmarc-charset-perl/trunk/t/escape1.t
    packages/libmarc-charset-perl/trunk/t/escape2.t
    packages/libmarc-charset-perl/trunk/t/utf8.t

Modified: packages/libmarc-charset-perl/trunk/Changes
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/Changes?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/Changes (original)
+++ packages/libmarc-charset-perl/trunk/Changes Fri May 18 22:53:45 2007
@@ -1,8 +1,14 @@
 Revision history for MARC::Charset
 
-0.95 Tue Feb  7 11:38:05 EST 2006
-     - bugfix in combining character handling  (thanks Mike Rylander)
-     - added t/entities.t
+0.96 Wed Mar 14 01:24:48 EDT 2007
+     - added ignore_errors() to skip MARC8 -> UTF8 snafus
+     - added assume_encoding() to treat transcoding failures as if they
+       are from a known, specific encoding.  Useful if you have a set of
+       records that, for instance, report being MARC8 but are actually
+       encoded in Latin1 (which, btw, is completely invalid and also very
+       common).  Only in effect when ignore_errors() is true.
+     - added assume_unicode() to treat invalid MARC8 as UTF8.  This is a
+       convenience function based on assume_encoding().
 
 0.92 Sat Feb  4 19:34:19 CST 2006
      - marc8_to_utf8 and utf8_to_marc8 needed to pass along spaces 

Modified: packages/libmarc-charset-perl/trunk/MANIFEST
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/MANIFEST?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/MANIFEST (original)
+++ packages/libmarc-charset-perl/trunk/MANIFEST Fri May 18 22:53:45 2007
@@ -8,7 +8,7 @@
 lib/MARC/Charset/Constants.pm
 lib/MARC/Charset/Table.pm
 Makefile.PL
-MANIFEST
+MANIFEST			This list of files
 META.yml
 README
 t/cjk.t
@@ -19,7 +19,6 @@
 t/code.t
 t/cyrillic.marc
 t/decompose.t
-t/entities.t
 t/escape1.t
 t/escape2.t
 t/hebrew1.marc

Modified: packages/libmarc-charset-perl/trunk/META.yml
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/META.yml?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/META.yml (original)
+++ packages/libmarc-charset-perl/trunk/META.yml Fri May 18 22:53:45 2007
@@ -1,7 +1,7 @@
 # http://module-build.sourceforge.net/META-spec.html
 #XXXXXXX This is a prototype!!!  It will change in the future!!! XXXXX#
 name:         MARC-Charset
-version:      0.95
+version:      0.96
 version_from: lib/MARC/Charset.pm
 installdirs:  site
 requires:

Modified: packages/libmarc-charset-perl/trunk/README
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/README?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/README (original)
+++ packages/libmarc-charset-perl/trunk/README Fri May 18 22:53:45 2007
@@ -21,7 +21,7 @@
 Unicode notwithstanding, libraries still have a wealth of data encoded using 
 MARC-8. Yet, some new data formats such as XML require that characters are 
 encoded using Unicode. In order to fascilitate conversion the Library of 
-Congress graciously published character mappings to enable the conversion 
+Congress graciously published character mappings to fascilitate the conversion 
 of MARC-8 data to Unicode. 
 
 MARC::Charset is basically an implementation of the character mappings that 

Modified: packages/libmarc-charset-perl/trunk/debian/changelog
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/debian/changelog?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/debian/changelog (original)
+++ packages/libmarc-charset-perl/trunk/debian/changelog Fri May 18 22:53:45 2007
@@ -1,3 +1,10 @@
+libmarc-charset-perl (0.96-1) unstable; urgency=low
+
+  * New upstream release.
+  * Set Standards-Version to 3.7.2 (no changes).
+
+ -- gregor herrmann <gregor+debian at comodo.priv.at>  Sat, 19 May 2007 00:53:27 +0200
+
 libmarc-charset-perl (0.95-2) unstable; urgency=low
 
   * Fix typo in Description

Modified: packages/libmarc-charset-perl/trunk/debian/control
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/debian/control?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/debian/control (original)
+++ packages/libmarc-charset-perl/trunk/debian/control Fri May 18 22:53:45 2007
@@ -5,7 +5,7 @@
 Build-Depends-Indep: perl (>= 5.8.0-7), libxml-sax-perl, libclass-accessor-perl, libtest-pod-perl
 Maintainer: Debian Perl Group <pkg-perl-maintainers at lists.alioth.debian.org>
 Uploaders: gregor herrmann <gregor+debian at comodo.priv.at>
-Standards-Version: 3.6.2
+Standards-Version: 3.7.2
 XS-Vcs-Svn: svn://svn.debian.org/pkg-perl/packages/libmarc-charset-perl/trunk/
 
 Package: libmarc-charset-perl

Modified: packages/libmarc-charset-perl/trunk/lib/MARC/Charset.pm
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/lib/MARC/Charset.pm?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/lib/MARC/Charset.pm (original)
+++ packages/libmarc-charset-perl/trunk/lib/MARC/Charset.pm Fri May 18 22:53:45 2007
@@ -1,6 +1,6 @@
 package MARC::Charset;
 
-our $VERSION = '0.95';
+our $VERSION = '0.96';
 use strict;
 use warnings;
 
@@ -8,6 +8,7 @@
 our @EXPORT_OK = qw(marc8_to_utf8 utf8_to_marc8);
 
 use Unicode::Normalize;
+use Encode 'decode';
 use MARC::Charset::Table;
 use MARC::Charset::Constants qw(:all);
 
@@ -47,6 +48,72 @@
 our $DEFAULT_G0 = ASCII_DEFAULT; 
 our $DEFAULT_G1 = EXTENDED_LATIN;
 
+=head2 ignore_errors()
+
+Tells MARC::Charset whether or not to ignore all encoding errors, and
+returns the current setting.  This is helepfuli if you have records that
+contain both MARC8 and UNICODE characters.
+
+    my $ignore = MARC::Charset->ignore_errors();
+    
+    MARC::Charset->ignore_errors(1); # ignore errors
+    MARC::Charset->ignore_errors(0); # DO NOT ignore errors
+
+=cut
+
+
+our $_ignore_errors = 0;
+sub ignore_errors {
+	my ($self,$i) = @_;
+	$_ignore_errors = $i if (defined($i));
+	return $_ignore_errors;
+}
+
+
+=head2 assume_unicode()
+
+Tells MARC::Charset whether or not to assume UNICODE when an error is
+encountered in ignore_errors mode and returns the current setting.
+This is helepfuli if you have records that contain both MARC8 and UNICODE
+characters.
+
+    my $setting = MARC::Charset->assume_unicode();
+    
+    MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
+    MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode
+
+=cut
+
+
+our $_assume = '';
+sub assume_unicode {
+	my ($self,$i) = @_;
+	$_assume = 'utf8' if (defined($i) and $i);
+	return 1 if ($_assume eq 'utf8');
+}
+
+
+=head2 assume_encoding()
+
+Tells MARC::Charset whether or not to assume a specific encoding when an error
+is encountered in ignore_errors mode and returns the current setting.  This
+is helpful if you have records that contain both MARC8 and other characters.
+
+    my $setting = MARC::Charset->assume_encoding();
+    
+    MARC::Charset->assume_encoding('cp850'); # assume characters are cp850
+    MARC::Charset->assume_encoding(''); # DO NOT assume any encoding
+
+=cut
+
+
+sub assume_encoding {
+	my ($self,$i) = @_;
+	$_assume = $i if (defined($i));
+	return $_assume;
+}
+
+
 # place holders for working graphical character sets
 my $G0; 
 my $G1;
@@ -58,9 +125,15 @@
     my $utf8 = marc8_to_utf8($marc8);
 
 If you'd like to ignore errors pass in a true value as the 2nd 
-parameter:
+parameter or call MARC::Charset->ignore_errors() with a true
+value:
 
     my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');
+
+  or
+  
+    MARC::Charset->ignore_errors(1);
+    my $utf8 = marc8_to_utf8($marc8);
 
 =cut
 
@@ -70,6 +143,8 @@
     my ($marc8, $ignore_errors) = @_;
     reset_charsets();
 
+    $ignore_errors = $_ignore_errors if (!defined($ignore_errors));
+
     # holder for our utf8
     my $utf8 = '';
 
@@ -95,14 +170,14 @@
         }
 
         my $found;
-        CHARSET_LOOP: foreach my $charset ($G0, $G1) 
+	CHARSET_LOOP: foreach my $charset ($G0, $G1) 
         {
 
             # cjk characters are a string of three chars
-            my $char_size = $charset eq CJK ? 3 : 1;
+	    my $char_size = $charset eq CJK ? 3 : 1;
 
             # extract the next code point to examine
-            my $chunk = substr($marc8, $index, $char_size);
+	    my $chunk = substr($marc8, $index, $char_size);
 
             # look up the character to see if it's in our mapping 
             my $code = $table->lookup_by_marc8($charset, $chunk);
@@ -118,7 +193,7 @@
             if ($code->is_combining())
             {
                 $combining .= $code->char_value();
-            }
+	    }
             else
             {
                 $utf8 .= $code->char_value() . $combining;
@@ -127,18 +202,23 @@
 
             $index += $char_size;
             next CHAR_LOOP;
-        }
+	}
 
         if (!$found)
         {
-            warn("no mapping found at position $index in $marc8 ".
+            warn(sprintf("no mapping found for [0x\%X] at position $index in $marc8 ".
                 "g0=".MARC::Charset::Constants::charset_name($G0) . " " .
-                "g1=".MARC::Charset::Constants::charset_name($G1));
+                "g1=".MARC::Charset::Constants::charset_name($G1), unpack('C',substr($marc8,$index,1))));
             if (!$ignore_errors)
             {
                 reset_charsets();
                 return;
             }
+            if ($_assume)
+            {
+                reset_charsets();
+                return NFC(decode($_assume => $marc8));
+            }
             $index += 1;
         }
 
@@ -162,6 +242,11 @@
 parameter:
 
     my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');
+
+  or
+  
+    MARC::Charset->ignore_errors(1);
+    my $utf8 = marc8_to_utf8($marc8);
 
 =cut
 
@@ -169,6 +254,8 @@
 {
     my ($utf8, $ignore_errors) = @_;
     reset_charsets();
+
+    $ignore_errors = $_ignore_errors if (!defined($ignore_errors));
 
     # decompose combined characters
     $utf8 = NFD($utf8);
@@ -334,22 +421,22 @@
     }
 
     elsif ( $esc_char_1 eq MULTI_G0_A ) {
-        $G0 = $esc_char_2;
+	$G0 = $esc_char_2;
         return $left+3;
     }
 
     elsif ($esc_chars eq MULTI_G0_B 
         and ($left+3 < $right)) 
     {
-        $G0 = substr($$str_ref, $left+3, 1);
-        return $left+4;
+	$G0 = substr($$str_ref, $left+3, 1);
+	return $left+4;
     }
 
     elsif (($esc_chars eq MULTI_G1_A or $esc_chars eq MULTI_G1_B)
         and ($left + 3 < $right)) 
     {
-        $G1 = substr($$str_ref, $left+3, 1);
-        return $left+4;
+	$G1 = substr($$str_ref, $left+3, 1);
+	return $left+4;
     }
 
     # we should never get here

Modified: packages/libmarc-charset-perl/trunk/lib/MARC/Charset/Constants.pm
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/lib/MARC/Charset/Constants.pm?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/lib/MARC/Charset/Constants.pm (original)
+++ packages/libmarc-charset-perl/trunk/lib/MARC/Charset/Constants.pm Fri May 18 22:53:45 2007
@@ -19,46 +19,46 @@
 use warnings;
 use base qw( Exporter );
 
-use constant ESCAPE             => chr(0x1B);
+use constant ESCAPE		=> chr(0x1B);
 
-use constant SINGLE_G0_A        => chr(0x28);
-use constant SINGLE_G0_B        => chr(0x2C);
-use constant MULTI_G0_A         => chr(0x24);
-use constant MULTI_G0_B         => chr(0x24) . chr(0x2C);
+use constant SINGLE_G0_A	=> chr(0x28);
+use constant SINGLE_G0_B	=> chr(0x2C);
+use constant MULTI_G0_A		=> chr(0x24);
+use constant MULTI_G0_B		=> chr(0x24) . chr(0x2C);
 
-use constant SINGLE_G1_A        => chr(0x29);
-use constant SINGLE_G1_B        => chr(0x2D);
-use constant MULTI_G1_A         => chr(0x24) . chr(0x29);
-use constant MULTI_G1_B         => chr(0x24) . chr(0x2D);
+use constant SINGLE_G1_A	=> chr(0x29);
+use constant SINGLE_G1_B	=> chr(0x2D);
+use constant MULTI_G1_A		=> chr(0x24) . chr(0x29);
+use constant MULTI_G1_B		=> chr(0x24) . chr(0x2D);
 
-use constant GREEK_SYMBOLS      => chr(0x67);
-use constant SUBSCRIPTS         => chr(0x62);
-use constant SUPERSCRIPTS       => chr(0x70);
-use constant ASCII_DEFAULT      => chr(0x73);
+use constant GREEK_SYMBOLS	=> chr(0x67);
+use constant SUBSCRIPTS		=> chr(0x62);
+use constant SUPERSCRIPTS	=> chr(0x70);
+use constant ASCII_DEFAULT	=> chr(0x73);
 
-use constant BASIC_ARABIC       => chr(0x33);
-use constant EXTENDED_ARABIC    => chr(0x34);
-use constant BASIC_LATIN        => chr(0x42);
-use constant EXTENDED_LATIN     => chr(0x45);
-use constant CJK                => chr(0x31);
-use constant BASIC_CYRILLIC     => chr(0x4E);
-use constant EXTENDED_CYRILLIC  => chr(0x51);
-use constant BASIC_GREEK        => chr(0x53);
-use constant BASIC_HEBREW       => chr(0x32);
+use constant BASIC_ARABIC	=> chr(0x33);
+use constant EXTENDED_ARABIC	=> chr(0x34);
+use constant BASIC_LATIN	=> chr(0x42);
+use constant EXTENDED_LATIN	=> chr(0x45);
+use constant CJK		=> chr(0x31);
+use constant BASIC_CYRILLIC	=> chr(0x4E);
+use constant EXTENDED_CYRILLIC	=> chr(0x51);
+use constant BASIC_GREEK	=> chr(0x53);
+use constant BASIC_HEBREW	=> chr(0x32);
 
 our %EXPORT_TAGS = ( all => [ qw( 
-    ESCAPE  GREEK_SYMBOLS  SUBSCRIPTS  SUPERSCRIPTS  ASCII_DEFAULT
-    SINGLE_G0_A  SINGLE_G0_B  MULTI_G0_A  MULTI_G0_B  SINGLE_G1_A 
-    SINGLE_G1_B  MULTI_G1_A  MULTI_G1_B  BASIC_ARABIC  
-    EXTENDED_ARABIC BASIC_LATIN EXTENDED_LATIN CJK  BASIC_CYRILLIC  
-    EXTENDED_CYRILLIC BASIC_GREEK BASIC_HEBREW) ]);
+	ESCAPE  GREEK_SYMBOLS  SUBSCRIPTS  SUPERSCRIPTS  ASCII_DEFAULT
+	SINGLE_G0_A  SINGLE_G0_B  MULTI_G0_A  MULTI_G0_B  SINGLE_G1_A 
+	SINGLE_G1_B  MULTI_G1_A  MULTI_G1_B  BASIC_ARABIC  
+	EXTENDED_ARABIC BASIC_LATIN EXTENDED_LATIN CJK  BASIC_CYRILLIC  
+	EXTENDED_CYRILLIC BASIC_GREEK BASIC_HEBREW) ]);
 
 our @EXPORT_OK = qw(
-    ESCAPE  GREEK_SYMBOLS  SUBSCRIPTS  SUPERSCRIPTS ASCII_DEFAULT
-    SINGLE_G0_A  SINGLE_G0_B  MULTI_G0_A  MULTI_G0_B  SINGLE_G1_A 
-    SINGLE_G1_B  MULTI_G1_A  MULTI_G1_B  BASIC_ARABIC  
-    EXTENDED_ARABIC BASIC_LATIN EXTENDED_LATIN CJK  BASIC_CYRILLIC  
-    EXTENDED_CYRILLIC BASIC_GREEK BASIC_HEBREW);
+	ESCAPE  GREEK_SYMBOLS  SUBSCRIPTS  SUPERSCRIPTS ASCII_DEFAULT
+	SINGLE_G0_A  SINGLE_G0_B  MULTI_G0_A  MULTI_G0_B  SINGLE_G1_A 
+	SINGLE_G1_B  MULTI_G1_A  MULTI_G1_B  BASIC_ARABIC  
+	EXTENDED_ARABIC BASIC_LATIN EXTENDED_LATIN CJK  BASIC_CYRILLIC  
+	EXTENDED_CYRILLIC BASIC_GREEK BASIC_HEBREW);
 
 sub charset_name
 {

Modified: packages/libmarc-charset-perl/trunk/t/escape1.t
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/t/escape1.t?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/t/escape1.t (original)
+++ packages/libmarc-charset-perl/trunk/t/escape1.t Fri May 18 22:53:45 2007
@@ -12,9 +12,9 @@
 
 my $test = 
     'it is all greek ' . 
-    ESCAPE . GREEK_SYMBOLS .                  ## escape to Greek Symbols
-    chr(0x61) . chr(0x62) . chr(0x63) .       ## ALPHA BETA GAMMA
-    ESCAPE . ASCII_DEFAULT.                   ## back to ASCII
+    ESCAPE . GREEK_SYMBOLS .		    ## escape to Greek Symbols
+    chr(0x61) . chr(0x62) . chr(0x63) .	    ## ALPHA BETA GAMMA
+    ESCAPE . ASCII_DEFAULT.		    ## back to ASCII
     ' to me';
 
 my $expected = 
@@ -28,23 +28,26 @@
 ## Subscripts
 
 $test = 
-    'subscript1' .                    
-    ESCAPE . SUBSCRIPTS .                  ## escape to Subscripts 
-    chr(0x31) .                            ## subscript 1
-    ESCAPE . ASCII_DEFAULT .               ## back to ASCII
-    'subscript9' .            
-    ESCAPE . SUBSCRIPTS .                  ## escape to Subscripts
-    chr(0x39) .                            ## subscript 9
-    ESCAPE . ASCII_DEFAULT .               ## back to ASCII
+    'subscript1' .		    
+    ESCAPE . SUBSCRIPTS .		    ## escape to Subscripts 
+    chr(0x31) . 			    ## subscript 1
+    ESCAPE . ASCII_DEFAULT .		    ## back to ASCII
+    'subscript9' .	    
+    ESCAPE . SUBSCRIPTS .		    ## escape to Subscripts
+    chr(0x39) .				    ## subscript 9
+    ESCAPE . ASCII_DEFAULT .		    ## back to ASCII
     'subscript10' . 
-    ESCAPE . SUBSCRIPTS .                  ## back to Subscripts again
-    chr(0x31) . chr(0x30) .                ## subscript 10
-    ESCAPE . ASCII_DEFAULT;                ## back to ASCII
+    ESCAPE . SUBSCRIPTS .		    ## back to Subscripts again
+    chr(0x31) . chr(0x30) .		    ## subscript 10
+    ESCAPE . ASCII_DEFAULT;		    ## back to ASCII
 
 $expected = 
     'subscript1' . chr(0x2081) . 
     'subscript9' . chr(0x2089) . 
     'subscript10' . chr(0x2081) . chr(0x2080); 
+    # ucs 'subscript1' . chr(0xE28281) . 
+    # ucs 'subscript9' . chr(0xE28289) . 
+    # ucs 'subscript10' . chr(0xE28281) . chr(0xE28280); 
 
 is( marc8_to_utf8($test), $expected, 'Subscripts' );
 
@@ -53,22 +56,25 @@
 
 $test =
     'superscript1' . 
-    ESCAPE . SUPERSCRIPTS .                    ## escape to Superscripts
-    chr(0x31) .                                ## superscript 1
-    ESCAPE . ASCII_DEFAULT .                   ## back to ASCII
+    ESCAPE . SUPERSCRIPTS .		    ## escape to Superscripts
+    chr(0x31) .				    ## superscript 1
+    ESCAPE . ASCII_DEFAULT .		    ## back to ASCII
     'superscript9' . 
-    ESCAPE . SUPERSCRIPTS .                    ## escape to Superscripts
-    chr(0x39) .                                ## superscript 9
-    ESCAPE . ASCII_DEFAULT .                   ## back to ASCII
+    ESCAPE . SUPERSCRIPTS .		    ## escape to Superscripts
+    chr(0x39) .				    ## superscript 9
+    ESCAPE . ASCII_DEFAULT .		    ## back to ASCII
     'superscript10' .
     ESCAPE . SUPERSCRIPTS . 
-    chr(0x31) . chr(0x30) .                    ## superscript 10
-    ESCAPE . ASCII_DEFAULT;                    ## back to ASCII
+    chr(0x31) . chr(0x30) .		    ## superscript 10
+    ESCAPE . ASCII_DEFAULT; 		    ## back to ASCII
 
 $expected = 
     'superscript1' . chr(0x00B9) . 
     'superscript9' . chr(0x2079) . 
     'superscript10' . chr(0x00B9) . chr(0x2070); 
+    # ucs 'superscript1' . chr(0xC2B9) .
+    # ucs 'superscript9' . chr(0xE281B9) . 
+    # ucs 'superscript10' . chr(0xC2B9) . chr(0xE281B0); 
 
 is( marc8_to_utf8($test), $expected, 'Superscripts' );
     

Modified: packages/libmarc-charset-perl/trunk/t/escape2.t
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/t/escape2.t?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/t/escape2.t (original)
+++ packages/libmarc-charset-perl/trunk/t/escape2.t Fri May 18 22:53:45 2007
@@ -12,11 +12,11 @@
 ## test some ASCII & Greek mixed together
 
 my $test = 
-    'this is greek' .                       ## regular ASCII
+    'this is greek' .			    ## regular ASCII
     ESCAPE . SINGLE_G0_A . BASIC_GREEK .    ## set G0 to Greek
-    chr(0x49) .                             ## zeta
+    chr(0x49) .				    ## zeta
     ESCAPE . SINGLE_G0_A . BASIC_LATIN .    ## set GO to ASCII
-    'this is not';                          ## regular ASCII
+    'this is not';			    ## regular ASCII
 
 my $expected = 'this is greek' . chr(0x0396) . 'this is not';
 is(marc8_to_utf8($test), $expected, 'escape type 2 to Greek');
@@ -26,8 +26,8 @@
 $test = 
     ESCAPE . SINGLE_G0_A . BASIC_ARABIC .   ## set G0 to ArabicBasic
     ESCAPE . SINGLE_G1_A . EXTENDED_ARABIC. ## set G1 to ArabicExtended
-    chr(0x4d) .                             ## HAH (from Basic)
-    chr(0xBA);                              ## DUL (from Extended)
+    chr(0x4d) .				    ## HAH (from Basic)
+    chr(0xBA);				    ## DUL (from Extended)
 
 $expected = chr(0x062D) . chr(0x068E);
 is(marc8_to_utf8($test), $expected, 'escape type 2 to Basic+Ext Arabic');
@@ -37,10 +37,10 @@
 $test = 
     ESCAPE . SINGLE_G0_A . BASIC_ARABIC .   ## set G0 to ArabicBasic
     ESCAPE . SINGLE_G1_A . EXTENDED_ARABIC. ## set G1 to ArabicExtended
-    chr(0x47) .                             ## ALEF (Arabic Basic) 
+    chr(0x47) .				    ## ALEF (Arabic Basic) 
     ESCAPE . SINGLE_G0_A . BASIC_HEBREW .   ## replace ArabicBasic with Hebrew
-    chr(0x71) .                             ## SAMEKH (Hebrew)
-    chr(0xE9);                              ## RNOON (ArabicExtended)
+    chr(0x71) .				    ## SAMEKH (Hebrew)
+    chr(0xE9); 				    ## RNOON (ArabicExtended)
 
 $expected = chr(0x0627) . chr(0x05E1) . chr(0x06BB);
 is(marc8_to_utf8($test), $expected, 'escape type 2 Arabic + Hebrew mixed');

Modified: packages/libmarc-charset-perl/trunk/t/utf8.t
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/t/utf8.t?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/t/utf8.t (original)
+++ packages/libmarc-charset-perl/trunk/t/utf8.t Fri May 18 22:53:45 2007
@@ -22,35 +22,35 @@
 is(
     utf8_to_marc8(chr(0x0628)),
     ESCAPE . SINGLE_G0_A . BASIC_ARABIC . chr(0x48) . 
-    ESCAPE . ASCII_DEFAULT,
+	ESCAPE . ASCII_DEFAULT,
     'Basic Arabic' 
 );
 
 is(
     utf8_to_marc8(chr(0x068D)),
     ESCAPE . SINGLE_G1_A . EXTENDED_ARABIC . chr(0xB9) . 
-    ESCAPE . SINGLE_G1_A . EXTENDED_LATIN,
+	ESCAPE . SINGLE_G1_A . EXTENDED_LATIN,
     'Extended Arabic'
 );
 
 is(
     utf8_to_marc8(chr(0x0440)),
     ESCAPE . SINGLE_G0_A . BASIC_CYRILLIC . chr(0x52) . 
-    ESCAPE . ASCII_DEFAULT,
+	ESCAPE . ASCII_DEFAULT,
     'Basic Cyrillic'
 );
 
 is(
     utf8_to_marc8(chr(0x0408)),
     ESCAPE . SINGLE_G1_A . EXTENDED_CYRILLIC . chr(0xE8) . 
-    ESCAPE . SINGLE_G1_A . EXTENDED_LATIN,
+	ESCAPE . SINGLE_G1_A . EXTENDED_LATIN,
     'Extended Cyrillic'
 );
 
 is(
     utf8_to_marc8(chr(0x0398)),
     ESCAPE . SINGLE_G0_A . BASIC_GREEK . chr(0x4B) . 
-    ESCAPE . ASCII_DEFAULT,
+	ESCAPE . ASCII_DEFAULT,
     'Greek'
 );
 
@@ -60,7 +60,7 @@
 is(
     utf8_to_marc8(chr(0x05E0)),
     ESCAPE . SINGLE_G0_A . BASIC_HEBREW . chr(0x70) . 
-    ESCAPE . ASCII_DEFAULT,
+	ESCAPE . ASCII_DEFAULT,
     'Hebrew' 
 );
 
@@ -77,7 +77,7 @@
 is(
     utf8_to_marc8(chr(0x71AC)),
     ESCAPE . MULTI_G0_A . CJK . chr(0x21) . chr(0x49) . chr(0x7C) . 
-    ESCAPE . ASCII_DEFAULT, 
+	ESCAPE . ASCII_DEFAULT, 
     'East Asian'
 );
 
@@ -90,7 +90,8 @@
 );
 
 is(
-    utf8_to_marc8('abc' . chr(0x0327) . chr(0x0300) . chr(0x0301) . 'def'),
+    utf8_to_marc8('abc' . chr(0x0327) . chr(0x0300) . chr(0x0301) 
+	. 'def'),
     'ab' . chr(0xF0) . chr(0xE1) . chr(0xE2) . 'cdef',
     'string with multiple interior combining characters'
 );
@@ -101,7 +102,7 @@
 is(
     utf8_to_marc8(chr(0x043A)),
     ESCAPE . SINGLE_G0_A . BASIC_CYRILLIC . chr(0x4B) .
-    ESCAPE . ASCII_DEFAULT ,
+	ESCAPE . ASCII_DEFAULT ,
     'CYRILLIC SMALL LETTER KA'
 );
 
@@ -109,8 +110,8 @@
 is(
     utf8_to_marc8(chr(0x05D0) . chr(0x043B)),
     ESCAPE . SINGLE_G0_A . BASIC_HEBREW . chr(0x60) .
-    ESCAPE . SINGLE_G0_A . BASIC_CYRILLIC . chr(0x4C) .
-    ESCAPE . ASCII_DEFAULT,
+	ESCAPE . SINGLE_G0_A . BASIC_CYRILLIC . chr(0x4C) .
+	ESCAPE . ASCII_DEFAULT,
     'string with multiple character sets'
 );
 




More information about the Pkg-perl-cvs-commits mailing list