需要解析非常大的 XML 文件 PHP

Question

Hi I'm trying yo use XMLParser the parse an xml file嗨，我正在尝试使用 XMLParser 解析 xml 文件

I need to end up with something like this productID: xerox_106r0116 name: Xerox zwart, tonercartridge met grote capaciteit (tot 32.000 pag.) (106R01163)我需要最终得到这样的产品 ID：xerox_106r0116 名称：Xerox zwart，碳粉盒符合 grote capaciteit（总 32.000 页。）（106R01163）

However what I get is this然而我得到的是这个

text: xerox_106r0116文字：xerox_106r0116

text: Xerox zwart, tonercartridge met grote capaciteit (tot 32.000 pag.) (106R01163)文字：Xerox zwart，碳粉盒符合 Grote 容量（总计 32.000 页。）(106R01163)

Does anyone know how to properly parse with XMLParser有谁知道如何使用 XMLParser 正确解析

The XML is below XML在下面

<?xml version="1.0" encoding="utf-8"?>
<products>
<product>
<productID>xerox_106r01163</productID>
<name>Xerox zwart, tonercartridge met grote capaciteit (tot 32.000 pag.) (106R01163)</name>
<price currency="EUR">165.77</price>
<productURL>http://www.centralpoint.nl/tracker/index.php?tt=534_251713_1_&amp;r=http%3A%2F%2Fwww.centralpoint.nl%2Ftoners-laser-cartridges%2Fxerox%2Fzwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-art-106r01163-num-17879%2F</productURL>
<imageURL>https://www02.cp-static.com/objects/low_pic/3/3a9/117949_toners-laser-cartridges-xerox-zwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-106r01163.jpg</imageURL>
<description><![CDATA[Black Toner Cartridge, Phaser 7760
Our Phaser 7760
 toner cartridges utilize a revolutionary toner manufacturing process where toner is chemically grown and processed into very small and consistent particles, resulting in sharper, high-gloss image quality, an increased range of colors, enhanced fine-line detail and superior reliability. Our longer-life toner cartridges reduce the need for customer interaction, and the Black toner cartridges print up to 32,000 pages each at 5% average area coverage.]]></description>
<categories>
<category path="toners &amp; lasercartridges">toners &amp; lasercartridges</category>
</categories>
<additional>
<field name="brand">Xerox</field>
<field name="producttype">zwart, tonercartridge met grote capaciteit (tot 32.000 pag.)</field>
<field name="deliveryCosts">0.00</field>
<field name="SKU">106R01163</field>
<field name="brand_and_type">Xerox 106R01163</field>
<field name="stock">Op voorraad</field>
<field name="thumbnailURL">https://www02.cp-static.com/objects/thumb_pic/3/3a9/117949_toners-laser-cartridges-xerox-zwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-106r01163.jpg</field>
<field name="deliveryTime">1 werkdag</field>
<field name="imageURLlarge">https://www02.cp-static.com/objects/high_pic/3/3a9/117949_toners-laser-cartridges-xerox-zwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-106r01163.jpg</field>
<field name="categoryURL">http://www.centralpoint.nl/toners-laser-cartridges/</field>
<field name="EAN">0095205224016</field>
</additional>
</product>
</products>

Answer 1

For very large files I use a combination of XMLReader (acts as a cursor going forward on the document stream) and SimpleXMLElement .对于非常大的文件，我使用XMLReader （充当文档流上的光标）和SimpleXMLElement 的组合。 In your case it would be something like this:在你的情况下，它会是这样的：

$xml = new XMLReader();
if(!$xml->open($FILE_NAME)){
    die("Error opening the XML file");
}

//Process XML with the product list
while($xml->read()){
    if($xml->nodeType==XMLReader::ELEMENT && $xml->name == 'product'){
        $product_xml = $xml->readOuterXml();

        //Getting attributes
        $product = simplexml_load_string($product_xml, 'SimpleXMLElement', LIBXML_NOBLANKS && LIBXML_NOWARNING);
        $product_id = (string)$product->productID;
        $product_name = (string)$product->name;

        //Then do something with product_id and product_name...
        echo "ProductID: ".$product_id." name:".$product_name;
    }
}
$xml->close();

Hope this helps.希望这可以帮助。

Answer 2

here is what i use for xml parsing, it loads xml document in to a DOMDocument object and you can work from that.这是我用于 xml 解析的内容，它将 xml 文档加载到 DOMDocument 对象中，您可以从中工作。 See if that helps看看有没有帮助

    $xml = new DOMDocument();
    $xml->preserveWhiteSpace = false;
    try{
        $xml->loadXML($string, LIBXML_NSCLEAN);
    }catch(\Exception $e){
       throw new Exception('Invalid XML structure');
    }
    return $xml;

需要解析非常大的 XML 文件 PHP

问题描述

text: xerox_106r0116文字：xerox_106r0116

text: Xerox zwart, tonercartridge met grote capaciteit (tot 32.000 pag.) (106R01163)文字：Xerox zwart，碳粉盒符合 Grote 容量（总计 32.000 页。）(106R01163)

2 个解决方案

解决方案1
3 2015-07-09 11:08:48

解决方案2
0 2015-07-09 11:02:16

需要解析非常大的 XML 文件 PHP

问题描述

text: xerox_106r0116文字：xerox_106r0116

text: Xerox zwart, tonercartridge met grote capaciteit (tot 32.000 pag.) (106R01163)文字：Xerox zwart，碳粉盒符合 Grote 容量（总计 32.000 页。）(106R01163)

2 个解决方案

解决方案1 3 2015-07-09 11:08:48

解决方案2 0 2015-07-09 11:02:16

解决方案1
3 2015-07-09 11:08:48

解决方案2
0 2015-07-09 11:02:16