简体   繁体   中英

PHP - removing XML empty nodes

I found this code to remove empty nodes from and XML file but it isn't working correctly. It leaves an empty node that really needs to be removed. Yes, it is empty, just white space in it.

$domxml = new DOMDocument('1.0');
$domxml->preserveWhiteSpace = false;
$domxml->formatOutput = true;
$domxml->loadXML($this->response);
$this->response = $domxml->saveXML($domxml->documentElement);

Anyone know of a better way to do this?

In other words you would like to remove any element node that has no text content, no attribute, no children with text content or attributes and have a parent element node (are not the document element).

Here is an Xpath function normalize-space() that converts any whitespace sequences to single spaces and strips them from the start/end. Any whitespace only content will result in an empty string.

Xpath

//* fetches any element node in the document in a list. You just need to add conditions.

  • Has no text content
    normalize-space(.) = ""
  • No attributes
    not(@*)
  • No descendant node with content (includes comments, ...)
    not(.//node()[normalize-space(.) != ""])
  • No descendant element nodes with attributes
    not(.//*[@*])
  • Has a parent element node
    parent::*

Put together:

$xml = <<<'XML'
<foo>
  <bar></bar>
  <bar>123</bar>
  <bar foo="123"></bar>
  <bar><foo>   </foo></bar>
  <bar><!-- test --></bar>
</foo>
XML;

$document = new DOMDocument();
$document->preserveWhiteSpace = FALSE;
$document->formatOutput = TRUE; 
$document->loadXml($xml);
$xpath = new DOMXpath($document);

$expression = 
  '//*[
    normalize-space(.) = "" and 
    not(@*) and  
    not(.//node()[normalize-space(.) != ""]) and 
    not(.//*[@*]) and
    parent::*
  ]';

$nodes = $xpath->evaluate($expression);
for ($i = $nodes->length - 1; $i >= 0; $i--) {
  $nodes[$i]->parentNode->removeChild($nodes[$i]);
}

echo $document->saveXml();

Output:

<?xml version="1.0"?>
<foo>
  <bar>123</bar>
  <bar foo="123"/>
  <bar>
    <!-- test -->
  </bar>
</foo>

For a generalized solution such as ALL nodes that are empty, consider XSLT. Specifically, use an empty template (translated as copy or style nothing) matched to all nodes in document with * and conditions for text values equal to empty [.=''] .

See XSLT Fiddle Demo using top PHP and XSLT StackOverflow users where each topusers node has at least one empty child, removed entirely in the result.

XSLT (save as .xsl)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- Identity Transform to Copy Document as is -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Empty Template to Remove Empty Nodes -->
  <xsl:template match="*[.='']"/>

</xsl:transform>

PHP (if needed enable php_xsl extension in .ini file)

// LOAD XML
$xml = new DOMDocument('1.0', 'UTF-8');
$xml->load('Input.xml');

// LOAD XSLT 
$xsl = new DOMDocument('1.0', 'UTF-8');   
$xsl->load('XSLT_Script.xsl');

// INITIALIZE TRANSFORMER
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);

// RUN TRANSFORMATION
$newXML = $proc->transformToXML($xml);

// SAVE NEW TREE TO FILE
echo $newXML;
file_put_contents('Output.xml', $newXML);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM