简体   繁体   English

需要解析非常大的 XML 文件 PHP

[英]Need to parse very large XML file PHP

Hi I'm trying yo use XMLParser the parse an xml file嗨,我正在尝试使用 XMLParser 解析 xml 文件

I need to end up with something like this productID: xerox_106r0116 name: Xerox zwart, tonercartridge met grote capaciteit (tot 32.000 pag.) (106R01163)我需要最终得到这样的产品 ID:xerox_106r0116 名称:Xerox zwart,碳粉盒符合 grote capaciteit(总 32.000 页。)(106R01163)

However what I get is this然而我得到的是这个

text: xerox_106r0116文字:xerox_106r0116

text: Xerox zwart, tonercartridge met grote capaciteit (tot 32.000 pag.) (106R01163)文字:Xerox zwart,碳粉盒符合 Grote 容量(总计 32.000 页。)(106R01163)

Does anyone know how to properly parse with XMLParser有谁知道如何使用 XMLParser 正确解析

The XML is below XML在下面

<?xml version="1.0" encoding="utf-8"?>
<products>
<product>
<productID>xerox_106r01163</productID>
<name>Xerox zwart, tonercartridge met grote capaciteit (tot 32.000 pag.) (106R01163)</name>
<price currency="EUR">165.77</price>
<productURL>http://www.centralpoint.nl/tracker/index.php?tt=534_251713_1_&amp;r=http%3A%2F%2Fwww.centralpoint.nl%2Ftoners-laser-cartridges%2Fxerox%2Fzwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-art-106r01163-num-17879%2F</productURL>
<imageURL>https://www02.cp-static.com/objects/low_pic/3/3a9/117949_toners-laser-cartridges-xerox-zwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-106r01163.jpg</imageURL>
<description><![CDATA[Black Toner Cartridge, Phaser 7760
Our Phaser 7760
 toner cartridges utilize a revolutionary toner manufacturing process where toner is chemically grown and processed into very small and consistent particles, resulting in sharper, high-gloss image quality, an increased range of colors, enhanced fine-line detail and superior reliability. Our longer-life toner cartridges reduce the need for customer interaction, and the Black toner cartridges print up to 32,000 pages each at 5% average area coverage.]]></description>
<categories>
<category path="toners &amp; lasercartridges">toners &amp; lasercartridges</category>
</categories>
<additional>
<field name="brand">Xerox</field>
<field name="producttype">zwart, tonercartridge met grote capaciteit (tot 32.000 pag.)</field>
<field name="deliveryCosts">0.00</field>
<field name="SKU">106R01163</field>
<field name="brand_and_type">Xerox 106R01163</field>
<field name="stock">Op voorraad</field>
<field name="thumbnailURL">https://www02.cp-static.com/objects/thumb_pic/3/3a9/117949_toners-laser-cartridges-xerox-zwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-106r01163.jpg</field>
<field name="deliveryTime">1 werkdag</field>
<field name="imageURLlarge">https://www02.cp-static.com/objects/high_pic/3/3a9/117949_toners-laser-cartridges-xerox-zwart-tonercartridge-met-grote-capaciteit-tot-32000-pag-106r01163.jpg</field>
<field name="categoryURL">http://www.centralpoint.nl/toners-laser-cartridges/</field>
<field name="EAN">0095205224016</field>
</additional>
</product>
</products>

For very large files I use a combination of XMLReader (acts as a cursor going forward on the document stream) and SimpleXMLElement .对于非常大的文件,我使用XMLReader (充当文档流上的光标)和SimpleXMLElement 的组合 In your case it would be something like this:在你的情况下,它会是这样的:

$xml = new XMLReader();
if(!$xml->open($FILE_NAME)){
    die("Error opening the XML file");
}

//Process XML with the product list
while($xml->read()){
    if($xml->nodeType==XMLReader::ELEMENT && $xml->name == 'product'){
        $product_xml = $xml->readOuterXml();

        //Getting attributes
        $product = simplexml_load_string($product_xml, 'SimpleXMLElement', LIBXML_NOBLANKS && LIBXML_NOWARNING);
        $product_id = (string)$product->productID;
        $product_name = (string)$product->name;

        //Then do something with product_id and product_name...
        echo "ProductID: ".$product_id." name:".$product_name;
    }
}
$xml->close();

Hope this helps.希望这可以帮助。

here is what i use for xml parsing, it loads xml document in to a DOMDocument object and you can work from that.这是我用于 xml 解析的内容,它将 xml 文档加载到 DOMDocument 对象中,您可以从中工作。 See if that helps看看有没有帮助

    $xml = new DOMDocument();
    $xml->preserveWhiteSpace = false;
    try{
        $xml->loadXML($string, LIBXML_NSCLEAN);
    }catch(\Exception $e){
       throw new Exception('Invalid XML structure');
    }
    return $xml;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM