简体   繁体   中英

How to edit large XML files in PHP based on a record in the XML Node

I'm trying to modify a 130mb+ XML file via PHP so it only shows the results where a child node is a specific value. I'm trying to filter this because of limitations via the software we're using to import the XML into our website.

Example: (mockup data)

<Items>
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>false</BrandDescr>
</Item> 
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>true</BrandDescr>
</Item> 
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>false</BrandDescr>
</Item>
</Items>

Desired result: I want to create a new XML file with only the records where the child "ShowOnWebsite" is true.

Problems I've run into Because the XML is so large simple solutions like using SimpleXML or loading the XML into the body and editing the nodes in there don't work. Because they all read the entire file into memory which is too slow and usually fails.

I've also looked at prewk/xml-string-streamer ( https://github.com/prewk/xml-string-streamer ) which is great for streaming large XML files because it doesn't place them in memory, although I can't find any way to modify the XML via that solution. (Other online posts say you need to have the nodes in memory to edit them).

Anyone got an idea on how to tackle this problem?

Goal

Desired result: I want to create a new XML file with only the records where the child "ShowOnWebsite" is true.

Given

test.xml

<Items>
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>false</ShowOnWebsite>
</Item> 
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>true</ShowOnWebsite>
</Item> 
<Item>
  <Barcode>...</Barcode>
  <BrandCode>...</BrandCode>
  <Title>...</Title>
  <Content>...</Content>
  <ShowOnWebsite>false</ShowOnWebsite>
</Item>
</Items>

Code

This is the implementation I wrote. The xmlByTheElement yields the childs without loading the xml at once into the memory.

<?php

function xmlByTheElement($fileName) {
    if ($file = fopen($fileName, "r")) {
        $buffer = "";
        $active = false;
        while(!feof($file)) {
            $line = fgets($file);
            $line = trim(str_replace(["\r", "\n"], "", $line));
            if($line == "<Item>") {
                $buffer .= $line;
                $active = true;
            } elseif($line == "</Item>") {
                $buffer .= $line;
                $active = false;
                yield new SimpleXMLElement($buffer);
                $buffer = "";
            } elseif($active == true) {
                $buffer .= $line;
            }
        }
        fclose($file);
    }   
}

$generator = xmlByTheElement("test.xml");
$output = new SimpleXMLElement('<?xml version="1.0" encoding="utf-8"?><Items></Items>');

foreach($generator as $element)
{
    if($element->ShowOnWebsite == "true") {
        $reader = simplexml_load_string($element->asXML());
        $item = $output->addChild('Item');
        $item->addChild('Barcode', (string) $reader->Barcode);
        $item->addChild('BrandCode', (string) $reader->BrandCode);
        $item->addChild('Title', (string) $reader->Title);
        $item->addChild('Content', (string) $reader->Content);
        $item->addChild('ShowOnWebsite', $reader->ShowOnWebsite);
    }
}

echo $output->asXML();

Output

<?xml version="1.0" encoding="utf-8"?>
<Items><Item><Barcode>...</Barcode><BrandCode>...</BrandCode><Title>...</Title><Content>...</Content><ShowOnWebsite>true</ShowOnWebsite></Item></Items>

XMLReader has an expand() method, but XMLWriter is missing the counterpart. So I added a XMLWriter::collapse() method in FluentDOM .

This allows to read the XML with XMLReader, expand it to DOM, use DOM methods to filter/manipulate the it and write it back with XMLWriter:

require __DIR__.'/../../vendor/autoload.php';

// Create the target writer and add the root element
$writer = new \FluentDOM\XMLWriter();
$writer->openUri('php://stdout');
$writer->setIndent(2);
$writer->startDocument();
$writer->startElement('Items');

// load the source into a reader
$reader = new \FluentDOM\XMLReader();
$reader->open(getXMLAsURI());

// iterate the person elements - the iterator expands them into a DOM node
foreach (new FluentDOM\XMLReader\SiblingIterator($reader, 'Item') as $item) {
  /** @var \FluentDOM\DOM\Element $item */
  // only "ShowOnWebsite = true"
  if ($item('ShowOnWebsite = "true"')) {
    // write expanded node to the output
    $writer->collapse($item);
  }
}

$writer->endElement();
$writer->endDocument();

function getXMLAsURI() {
  $xml = <<<'XML'
<Items>
  <Item>
    <Barcode>...</Barcode>
    <BrandCode>...</BrandCode>
    <Title>...</Title>
    <Content>...</Content>
    <ShowOnWebsite>false</ShowOnWebsite>
  </Item> 
  <Item>
    <Barcode>...</Barcode>
    <BrandCode>...</BrandCode>
    <Title>...</Title>
    <Content>...</Content>
    <ShowOnWebsite>true</ShowOnWebsite>
  </Item> 
  <Item>
    <Barcode>...</Barcode>
    <BrandCode>...</BrandCode>
    <Title>...</Title>
    <Content>...</Content>
    <ShowOnWebsite>false</ShowOnWebsite>
  </Item>
</Items>
XML;
  return 'data://text/plain;base64,'.base64_encode($xml);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM