[Po4a-devel] error parsing document header

D. Barbier bouzim at gmail.com
Thu Sep 27 13:05:30 UTC 2012


On 2012/9/27 David Prévot wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi,
>
> Le 27/09/2012 07:55, D. Barbier a écrit :
>
>> Indeed, this is due to accented characters.
>> It seems that length() returns the number of bytes and not characters.
>>  I looked at Unicode issues with Perl a very long time ago and do not
>> remember about its quirks; if anyone has a clue, please tell ;-)
>
> Thomas, CCed, helped us a lot for the DPNhtml2mail script [0], and
> managed to make that work.
>
>> 0: http://anonscm.debian.org/viewvc/publicity/dpn/scripts/DPNhtml2mail.pl?view=co
>
> I guess the magic operates in the end of the following code:
>
> # number of column of a string
> sub _columns {
>     my $str = scalar shift;
>
>     return 0 if ( !defined $str || $str eq '' );
>
>     $str = decode_utf8($str) unless utf8::is_utf8($str);
>     return Unicode::GCString->new($str)->columns();
> }

Thanks David,

This seems to be different, you are computing the string width whereas
I need the number of characters.
I believe that all we need is to add some ":encoding(foo)" flag when
opening file for reading, encoding must be specified and is thus
known.

Denis



More information about the Po4a-devel mailing list