简体繁体 English

在PHP或JavaScript中查询大型XML文件（600mb +）？

[英]Querying large XML file (600mb+) in PHP or JavaScript?

原文 2010-02-08 13:59:59 2 1 php/ javascript/ xml

I have a large XML file (600mb+) and am developing a PHP application which needs to query this file. 我有一个很大的XML文件（超过600mb），并且正在开发需要查询该文件的PHP应用程序。

My initial approach was to extract all the data from the file and insert it into a MySQL database - then query it that way. 我最初的方法是从文件中提取所有数据，然后将其插入MySQL数据库-然后以这种方式查询。 The only issue with this was that it was still slow, plus the XML data gets updated regularly - meaning I need to download, parse and insert data from the XML file into the database everytime the XML file is updated. 唯一的问题是它仍然很慢，而且XML数据会定期更新-意味着每次XML文件更新时，我都需要从XML文件下载，解析和插入数据到数据库中。

Is it actually possible to query a 600mb file? 实际上可以查询600mb的文件吗？ (for example, searching for records where TITLE="something here"?) Is it possible to get it to do this in a reasonable amount of time? （例如，搜索TITLE =“ something here”的记录？）是否有可能在合理的时间内完成此操作？

Ideally would like to do this in PHP, though I could also use JavaScript too. 理想情况下，我想在PHP中执行此操作，尽管我也可以使用JavaScript。

Any help and suggestions appreciated :) 任何帮助和建议表示赞赏:)

1 个解决方案

Constructing an XML DOM for a 600+ Mb document is definitely a way to fail. 为600+ Mb的文档构造XML DOM绝对是失败的一种方法。 What you need is SAX -based API. 您需要的是基于SAX的API。 SAX, though, does not usually allow XPath to be used, but you can emulate it with imperative code. 但是，SAX通常不允许使用XPath，但是您可以使用命令性代码来模拟它。

As for the file being updated, is it possible to retrieve only differences anyhow? 至于正在更新的文件，是否有可能仅检索差异？ That would massively speed up subsequent processing. 这将大大加快后续处理的速度。