简体   繁体   English

如何使用PHP解析带有多个xml声明的xml文件? (几个XML文件的串联)

[英]How to parse an xml file with multiple xml declaration using PHP? (A concatenation of several XML files)

Format of the xml: xml的格式:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE >
<root>
 <node>
  <element1></element1>
  <element2></element2>
  <element3></element2>
  <element4></element3>  
</node>
</root>

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE >
<root>
 <node>
  <element1></element1>
  <element2></element2>
  <element3></element2>
  <element4></element3>  
</node>
</root>

and several more xml declarations after. 以及之后的几个xml声明。 BTW, the file size 500MB. BTW,文件大小500MB。 I would like to ask for help how to parse this file without breaking it up into different files using PHP. 我想请求帮助如何解析这个文件,而不是使用PHP将其分解成不同的文件。

Any help would be appreciated. 任何帮助,将不胜感激。 Thank you.. 谢谢..

If you do not want to split the file, you will have to work with it in memory. 如果您不想拆分文件,则必须在内存中使用它。 Given your 500MB file size, this could turn out problematic. 鉴于您的文件大小为500MB,这可能会产生问题。 Anyway, one option would be to remove the XML Prolog and DocType from all documents and then load the whole thing like this: 无论如何,一个选项是从所有文档中删除XML Prolog和DocType,然后像这样加载整个事情:

$dom = new DOMDocument;
$dom->loadXML(
    sprintf(
        '<?xml version="1.0" encoding="UTF-8"?>%s' .
        '<!DOCTYPE >%s' . 
        '<roots>%s</roots>',
        PHP_EOL, 
        PHP_EOL, 
        str_replace(
            array(
                '<?xml version="1.0" encoding="UTF-8"?>', 
                '<!DOCTYPE >'
            ),
            '',
            file_get_contents('/path/to/your/file.xml')
        )
    )
);

This would make it one huge XML file with just one XML prolog and one DocType (note I am assuming the DocType is the same for all documents in the file). 这将使它成为一个巨大的XML文件,只有一个XML prolog和一个DocType(注意我假设DocType对于文件中的所有文档都是相同的)。 You could then process the file by iterating over the individual root elements. 然后,您可以通过迭代各个根元素来处理该文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM