[Po4a-devel]Winedocs and Po4a dependencies

Fri, 27 May 2005 10:45:42 +0200

Martin Quinson wrote:
[...]
>> * Locale::gettext (perl module)
>>   Needed by po4a for localization.
>>   Provided by liblocale-gettext-perl on Debian, perl-Locale-gettext on 
>>Mandrake and Fedora Core(DAG), perl-gettext on SUSE.
>>
>>   Would you be open to a patch that acted as a wrapper around 
>>Locale::gettext so that po4a would continue to work untranslated if that 
>>module was missing?
[...]
> Ok. Great. It could even be done by default in the Common.pm module, which
> would then export the d?gettext functions. Impact on other parts would be
> the need to kill the explicit gettext loading.

Good idea. I'll do that.

>> * Text::WrapI18N (perl module)
>>   Pure perl (so easy to check in) but depends on Text::CharWidth which 
>>is not pure perl.
>>   Provided by libtext-wrapi18n-perl on Debian. Found no RPM packages 
>>providing it.
>>
>>   Text::WrapI18N was not used in po4a 0.16.2. I initially thought it 
>>was used to wrap the text being output to the .po and .sgml files but in 
>>fact it seems to only be used to print messages, warnings and errors. 
>>Why is it needed? Doesn't a simple print work fine?
> 
> This module becomes important when you want to wrap CKJ languages
> (japaneese, corean), which don't have any spaces, if I understood well. So
> finding where you can cut the sentence properly is not as easy as in, say,
> french.
> 
> So, actually, we *ought* to use it all around the place. Maybe through a
> wrapper such as the one you propose for gettext...

Ok. I can understand why it would be important to use it for writing po 
files and Sgml files. It's a shame it's used everywhere but there ;-) 
(I've seen your other email explaining why it's that way).

What I don't understand is why it is used to print informational and 
error messages. Won't the xterm wrap things on its own? It does just 
fine when I print a very long line in French or English. Doesn't it do 
the same CKJ languages? If not I'd say it is pretty broken.

My next question has to do with the wrapper implementation. I did not 
check it out in details but it seems like it works on the multibyte 
string. Wouldn't it be possible to implement a wrapper using the Perl 
5.8 Unicode support? It would work like this:

   # "No&euml;l No&euml;l" in UTF-8
   my $oct="Noël Noël";

Here length($oct) = 11 because &euml; is encoded as two 8bit characters.

    my $str=Encode::decode("UTF-8", $oct);

Here length($str) = 9 because &euml; is encoded as one unicode 
character. This means we can cut up the string as we want:

    my $sub=Encode::encode("UTF-8",substr($str,0,3));

Here $sub contains "Noë", that is "No&euml;" in UTF-8 and has a length 
of 4.

There's still the issue of word boundaries because, at least for the 
Sgml file, we would not want to cut a word in two. But I would expect 
that, on Unicode strings, \b, \w amd \W would work sensibly even for CKJ 
languages. The advantage is that this would make it possible to do 
wrapping using only standard Perl features. The drawback is it would 
required perl 5.8.

-- 
Francois Gouget
fgouget@codeweavers.com