[Po4a-devel] error parsing document header

Thomas Blein tblein at tblein.eu
Thu Sep 27 12:54:54 UTC 2012


Hi,

Le jeudi 27 sept. 2012 à 08:24:46 (-0400), David Prévot a écrit :
> Le 27/09/2012 07:55, D. Barbier a écrit :
> 
> > Indeed, this is due to accented characters.
> > It seems that length() returns the number of bytes and not characters.
> >  I looked at Unicode issues with Perl a very long time ago and do not
> > remember about its quirks; if anyone has a clue, please tell ;-)
> 
> Thomas, CCed, helped us a lot for the DPNhtml2mail script [0], and
> managed to make that work.

Not really me: it was Ryuunosuke Ayanokouzi. He help a lot since for
Japanese it is even worse than for accented characters.

I will just precise the portion of code to get the real length of a
UTF-8 string:

use Unicode::GCString;
sub columns {
    return Unicode::GCString->new(shift)->columns();
}

After you can use the columns function instead of length.
The rest of the code was to be sure that the string was not empty and
encoded in UTF-8 and if not converted to UTF-8. After depending of the
application, you may not need it.

Best regards,

Thomas

PS: keep me in CC if you still want my input I am not subscribed to
Po4a-devel list.




More information about the Po4a-devel mailing list