[Po4a-devel] error parsing document header

D. Barbier bouzim at gmail.com
Sun Sep 30 23:12:34 UTC 2012


On 2012/9/27 D. Barbier wrote:
> On 2012/9/27 David Prévot wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> Hi,
>>
>> Le 27/09/2012 07:55, D. Barbier a écrit :
>>
>>> Indeed, this is due to accented characters.
>>> It seems that length() returns the number of bytes and not characters.
>>>  I looked at Unicode issues with Perl a very long time ago and do not
>>> remember about its quirks; if anyone has a clue, please tell ;-)
>>
>> Thomas, CCed, helped us a lot for the DPNhtml2mail script [0], and
>> managed to make that work.
>>
>>> 0: http://anonscm.debian.org/viewvc/publicity/dpn/scripts/DPNhtml2mail.pl?view=co
>>
>> I guess the magic operates in the end of the following code:
>>
>> # number of column of a string
>> sub _columns {
>>     my $str = scalar shift;
>>
>>     return 0 if ( !defined $str || $str eq '' );
>>
>>     $str = decode_utf8($str) unless utf8::is_utf8($str);
>>     return Unicode::GCString->new($str)->columns();
>> }
>
> Thanks David,
>
> This seems to be different, you are computing the string width whereas
> I need the number of characters.
> I believe that all we need is to add some ":encoding(foo)" flag when
> opening file for reading, encoding must be specified and is thus
> known.

Hello,

I was wrong, we need text width; I checked in some commits to use
Unicode::GCString if available, thanks to both of you for the help.
The downside is that there is one new string in PO files.

Denis



More information about the Po4a-devel mailing list