简体   繁体   中英

XML parsing and transformation in PHP?

I have a custom XML schema defined for page display that puts elements on the page by evaluating XML elements on the page. This is currently implemented using the preg regex functions, primarily the excellent preg_replace_callback function, eg:

...
$s = preg_replace_callback("!<field>(.*?)</field>!", replace_field, $s);
...

function replace_field($groups) {
  return isset($fields[$group[1]) ? $fields[$groups[1]] : "";
}

Just as an example.

Now this works pretty well... so long as the XML elements aren't nested. At this point it gets a whole lot more complicated, like if you have:

<field name="outer">
  <field name="inner">
    ...
  </field>
</field>

You want to make sure you replace the innermost field first. Judicious use of greedy/non-greedy regex patterns can go some of the way to handling these more complicated scenarios but the clear message is that I'm reaching the limits of what regex can reasonably do and really I need to be doing XML parsing.

What I'd like is an XML transformation package that:

allows me to conditionally evaluate/include the contained document tree or not based on a callback function ideally (analagous to preg_replace_callback); can handle nested elements of the same or different types; and handles attributes in a nice way (eg as an associative array).

What can help me along the way?

You can use XSL to do this - just match the inner patterns first.

Here is a good starting point for learning what you can do with XSL:

http://www.w3schools.com/xsl/

You can perform the xsl transformation server side or in the client (using js, activex or other).

If you still hate this idea of xsl you could look at the xml parsing built into PHP - google for SAX parser PHP- which is a callback implementation to build your custom parser, currently using libxml2.

PHP's XSLTProcessor class ( ext/xsl - PHP 5 includes the XSL extension by default and can be enabled by adding the argument --with-xsl[=DIR] to your configure line) is quite sophisticated and allows among other things the use of PHP functions within your XSL document by using the XSLTProcessor::registerPHPFunctions() method.

The following example is shamelessly pinched from the PHP manual page :

$xml = '<allusers>
 <user>
  <uid>bob</uid>
 </user>
 <user>
  <uid>joe</uid>
 </user>
</allusers>';
$xsl = '<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:php="http://php.net/xsl">
<xsl:output method="html" encoding="utf-8" indent="yes"/>
 <xsl:template match="allusers">
  <html><body>
    <h2>Users</h2>
    <table>
    <xsl:for-each select="user">
      <tr><td>
        <xsl:value-of
             select="php:function(\'ucfirst\',string(uid))"/>
      </td></tr>
    </xsl:for-each>
    </table>
  </body></html>
 </xsl:template>
</xsl:stylesheet>';
$xmldoc = DOMDocument::loadXML($xml);
$xsldoc = DOMDocument::loadXML($xsl);

$proc = new XSLTProcessor();
$proc->registerPHPFunctions();
$proc->importStyleSheet($xsldoc);
echo $proc->transformToXML($xmldoc);

Definitely not regexps. XML formats can change in ways that don't effect their content (in other words: that are invisible to XML-handling libraries), yet are significant to regexps. Such code becomes a maintenance nightmare quickly.

As to which parser to use (SAX, StAX, DOM, JDOM, dom4j, XOM, etc.),

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM