[Po4a-devel] [ po4a-Bugs-301335 ] PHP code inside CDATA is parsed as processing instruction

noreply at alioth.debian.org noreply at alioth.debian.org
Mon Nov 21 20:43:11 UTC 2005


Bugs item #301335, was opened at 29/03/2005 16:20
You can respond by visiting: 
http://alioth.debian.org/tracker/?func=detail&atid=410622&aid=301335&group_id=30267

Category: Sgml.pm
Group: None
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Chris Karakas (be-my-guest)
>Assigned to: Nicolas FRANCOIS (nekral-guest)
Summary: PHP code inside CDATA is parsed as processing instruction

Initial Comment:
Suppose you have the following in your SGML file:

<screen><![CDATA[<?php
]]><![CDATA[include("config.php");
]]><![CDATA[mysql_connect("$dbhost", "$dbuname", "$dbpass");
]]><![CDATA[mysql_select_db("$dbname");
]]><![CDATA[echo mysql_error();
]]><![CDATA[phpinfo();
]]><![CDATA[?> 
]]></screen>

This will be transformed to:

<screen>{PO4A-beg-CDATA}<?php
{PO4A-end}{PO4A-beg-CDATA}include("config.php");
{PO4A-end}{PO4A-beg-CDATA}mysql_connect("$dbhost", "$dbuname", "$dbpass");
{PO4A-end}{PO4A-beg-CDATA}mysql_select_db("$dbname");
{PO4A-end}{PO4A-beg-CDATA}echo mysql_error();
{PO4A-end}{PO4A-beg-CDATA}phpinfo();
{PO4A-end}{PO4A-beg-CDATA}?>       
{PO4A-end}</screen>

in the intermediate temporary file that is passed to nsgmls through the SGMLS module.

This will produce the error:

(po4a::sgml)
               Unknown SGML event type: pi

The reason is that in SGML.pm the value of $event-type is "pi" - meaning "Processing Instruction". The SGMLS module reads the "{PO4A-beg-CDATA}", does NOT understand that we are inside CDATA and then reads "<?php" and thinks it has a processing instruction, since it "something not in CDATA that starts with <?". So SGMLS sets $event-type to "pi".

But "pi" events are not handled by SGML.pm, so the if-elseif-elseif...construct that tests $event->type in SGML.pm ends in:

        else {
            die wrap_ref_mod($refs[$parse->line], "po4a::sgml", dgettext("po4a","Unknown SGML event type: %s"), $event->type);

        }

and we see the error above.

The solution must be some kind of check that says:

"If you are inside a CDATA of the *original* file and get an event type of 'pi', then treat the data as part of the CDATA, not as processing instruction."

However, this is bound to be tricky, since there might be multiple opening and closing CDATA tags on one line. Simple checks with regexps will not do. Actually, the best way would be to consult the parser itself - but po4a tricks the parser by changing "<![CDATA[" strings to {PO4A-beg-CDATA}, so we must "do the parser's work" here. I find it a bad idea. We will never be better than the parser itself. We open a Pandora's box here. You have been warned.

But we need to fix this, if we want to do real work with po4a...maybe it's time to abandon this "{PO4A-beg|end-*}" idea after all.

How to test
===========

Get the file

http://www.karakas-online.de/EN-Book/EN-Book.sgml

from 

http://www.karakas-online.de/EN-Book/formats.html

As you can see, this is a perfect, valid, bug-free SGML document, whose rendering in HTML you can admire in

http://www.karakas-online.de/EN-Book/

To reproduce the error, you must create an empty bibliography.sgml file:

touch bibliography.sgml

then run 

po4a-gettextize -v --option debug=generic -f sgml -m EN-Book.sgml -M iso-8859-1 -p EN-Book-en.po

You will see some other errors first:

- An error saying "CONTRIB not recognized". Go and enter contrib in the list of docbook tags in SGML.pm:

        $self->set_tags_kind("translate" => "abbrev acronym arg artheader attribution ".
                                            "contrib ".
                                            "date ".


- An error saying "KEYWORD not recognized". Go and change "kerword" to "keyword" in SGML.pm:

                                            "imageobject important index indexterm informaltable itemizedlist ".
                                            "keyword keywordset ".
                                            "legalnotice listitem lot "

After these two changes, SGML.pm will be able to continue processing up to the point where it encounters the above situation.

----------------------------------------------------------------------

>Comment By: Nicolas FRANCOIS (nekral-guest)
Date: 12/05/2005 01:41

Message:
connecté 
user_id=10852

Hello,

Sorry for the delay (we mostly use the mailing list and I did not checked the tracker).
Anyway, thanks for the report and your detailed analysis.

This problem has now been dealt with.
A fix is is the CVS.

I can send you a patch if you prefer.

Best Regards,
-- 
Nekral

----------------------------------------------------------------------

You can respond by visiting: 
http://alioth.debian.org/tracker/?func=detail&atid=410622&aid=301335&group_id=30267



More information about the Po4a-devel mailing list