[Po4a-devel]Sgml module does not translate the lang attribute

Francois Gouget fgouget@codeweavers.com
Thu, 26 May 2005 13:32:27 +0200


This is a multi-part message in MIME format.
--------------080601090404010908030702
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Martin Quinson wrote:
[...]
> if (lc($attr) eq "lang") {
>   $tag .= ' '.lc($attr).'='.$NEWLANG;
> } else { 
>   $tag .= ' '.lc($attr).'='.$value;
> }

Thanks for the hint. I started from here and looked at what Xml.pm was 
doing and I think I have a solution. This turned out to be simpler than 
I expected, but then maybe I missed something.

> Not necessary really clean and robust, but working, I *guess*. But do you
> know where we can get NEWLANG from?

The Xml.pm module simply creates an msgid with the lang value and lets 
the translator provide the new value. In fact it has a list of 
attributes that need translation so that one can provide translations 
for arbitrary attributes.

I think this approach makes sense so here is what I did:

  * I added a new 'attribute' 'tag kind'. I know that and attribute is 
not a tag but this lets me reuse the framework.

  * I added support for an 'attribute' option which lets the user expand 
the list of attributes that need to be translated. It works exactly like 
the existing 'translate', 'section', 'indent', etc. options...

  * For the DocBook document type I added 'lang' to the 'attribute' list.

  * And finally near line 690 I translate the attribute value if its 
name is in the %attribute hash.

There's one thing that Xml.pm supports that this does not support: 
Xml.pm lets you specify that an attribute must only be translated if it 
is found in a specific tag list:

    You can specify the attributes by their name (for example, "lang"),
    but you can prefix it with a tag hierarchy, to specify that this
    attribute will only be translated when it's into the specified tag.
    For example: <bbb><aaa>lang specifies that the lang attribute will
    only be translated if it's into an <aaa> tag, and it's into a <bbb>
    tag.

I think this functionality can be added later if needed. For now it 
seems mostly overkill to me.

There is another aspect that could be criticised: the patch will 
typically result in the following msgid:

msgid "en"
msgstr "fr"

That can be a bit ambiguous. Maybe the documentation has an 'en' that 
needs to be translated differently somewhere else. If it is felt that 
this is an issue it would be pretty easy to modify the patch so that the 
msgid reads as follows:

msgid "lang=en"
msgstr "lang=fr"

This could be done with the following (untested) code:

if ($attribute{uc($attr)})
{
     my $name=lc($attr);
     my $translated = $self->translate("$name=$value", "", "attribute 
".lc($attr));
     if ($translated =~ s/^$name=//)
     {
         $value=$translated;
     }
     else
     {
         print "warning bad translation for...";
     }
}

Note that this code should not have any quoting issue.
Let me know if you prefer this alternate version. In that case it would 
be best to modify Xml.pm in the same way to keep things consistent.


Changelog:

  * lib/Locale/Po4a/Sgml.pm

    Francois Gouget <fgouget@codeweavers.com>
    Add support for translating attribute values.


-- 
Francois Gouget
fgouget@codeweavers.com


--------------080601090404010908030702
Content-Type: text/plain;
 name="po4a-20050526-attribute.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="po4a-20050526-attribute.diff"

Index: lib/Locale/Po4a/Sgml.pm
===================================================================
RCS file: /cvsroot/po4a/po4a/lib/Locale/Po4a/Sgml.pm,v
retrieving revision 1.57
diff -u -p -r1.57 Sgml.pm
--- lib/Locale/Po4a/Sgml.pm	25 May 2005 16:55:20 -0000	1.57
+++ lib/Locale/Po4a/Sgml.pm	26 May 2005 10:44:59 -0000
@@ -82,6 +82,10 @@ they can be part of an msgid. For exampl
 for this category since putting it in the translate section would create
 msgids not being whole sentences, which is bad.
 
+=item attribute
+
+A space separated list of attributes that need to be translated.
+
 =item force
 
 Proceed even if the DTD is unknown.
@@ -274,13 +286,13 @@ sub set_tags_kind {
     my $self=shift;
     my (%kinds)=@_;
 
-    foreach (qw(translate empty section verbatim ignore)) {
+    foreach (qw(translate empty section verbatim ignore attribute)) {
 	$self->{SGML}->{k}{$_} = $self->{options}{$_} ? $self->{options}{$_}.' ' : '';
     }
     
     foreach (keys %kinds) {
 	die "po4a::sgml: internal error: set_tags_kind called with unrecognized arg $_"
-	    if ($_ !~ /^(translate|empty|verbatim|ignore|indent)$/);
+	    if ($_ !~ /^(translate|empty|verbatim|ignore|indent|attribute)$/);
 	
 	$self->{SGML}->{k}{$_} .= $kinds{$_};
     }    
@@ -426,7 +438,8 @@ sub parse_file {
 	                                    "varname ".
 	                                    "wordasword ".
 	                                    "xref ".
-                                            "year");
+                                            "year",
+                             "attribute" => "lang");
 
     } else {
 	if ($self->{options}{'force'}) {
@@ -616,7 +629,7 @@ sub parse_file {
     open (IN,$cmd) || die wrap_mod("po4a::sgml", dgettext("po4a", "Can't run nsgmls: %s"), $!);
 
     # The kind of tags
-    my (%translate,%empty,%verbatim,%indent,%exist);
+    my (%translate,%empty,%verbatim,%indent,%exist,%attribute);
     foreach (split(/ /, ($self->{SGML}->{k}{'translate'}||'') )) {
 	$translate{uc $_} = 1;
 	$indent{uc $_} = 1;
@@ -639,7 +652,10 @@ sub parse_file {
     foreach (split(/ /, ($self->{SGML}->{k}{'ignore'}) || '')) {
 	$exist{uc $_} = 1;
     }
-   
+      foreach (split(/ /, ($self->{SGML}->{k}{'attribute'}) || '')) {
+	$attribute{uc $_} = 1;
+    }
+ 
 
     # What to do before parsing
 
@@ -704,6 +720,10 @@ sub parse_file {
                 if ($val->type() eq 'CDATA' ||
 		    $val->type() eq 'IMPLIED') {
 		    if (defined $value && length($value)) {
+                        if ($attribute{uc($attr)})
+                        {
+                            $value = $self->translate($value, "", "attribute ".lc($attr));
+                        }
 			if ($value =~ m/"/) { #"
 			    $value = "'".$value."'";
 			} else {

--------------080601090404010908030702--