简体   繁体   English

如何使用jQuery / Javascript / PHP快速处理大型XML文件

[英]How to process large XML files with jQuery/Javascript/PHP faster

I'm making a store overview page that renders +- 20 products per page. 我正在制作一个商店概述页面,该页面每页呈现+-20种产品。 I'm getting my data from a zipped (gzip) XML file (*.xml.gz). 我从一个压缩的(gzip)XML文件(* .xml.gz)中获取数据。 Here's the feed: http://www.endclothing.com/media/end_feeds/awin_affiliates_eu.xml.gz Once a day I download the file to my server with PHP and extract the XML file. 这里是提要: http: //www.endclothing.com/media/end_feeds/awin_affiliates_eu.xml.gz每天一次,我使用PHP将文件下载到服务器上并提取XML文件。

The problem is, the XML file unzipped is +- 60MB and contains over 50k products. 问题是,解压缩后的XML文件为+-60MB,其中包含超过5万个产品。 Now when i try to get products and display them from the XML file, this goes very slowly. 现在,当我尝试获取产品并从XML文件显示它们时,这进展非常缓慢。 It takes about 8 seconds to display product information from a local XML with the code I use below: 用下面的代码显示本地XML的产品信息大约需要8秒钟:

$.ajax({
    type: "GET",
    url: 'feeds/awin_affiliates_eu.xml',
    cache: true,
    dataType: "xml",

    error: function (response) {
        alert("An error occurred while processing XML file");
        console.log('XML reading Failed: ', e);
    },

    success: function (response) {
        var max = 20;
        $(response).find("product").each(function (i) {

            if (i < max) {

                var _pid = $(this).find('pid').text();
                var _mpn = $(this).find('mpn').text();
                var _colour = $(this).find('colour').text();
                var _name = $(this).find('name').text();
                var _purl = $(this).find('purl').text();
                var _instock = $(this).find('instock').text();
                var _brand = $(this).find('brand').text();
                var _suitable_for = $(this).find('suitable_for').text();
                var _ptype = $(this).find('ptype').text();
                var _category = $(this).find('category').text();
                var _condition = $(this).find('condition').text();
                var _desc = $(this).find('desc').text();
                var _currency = $(this).find('currency').text();
                var _custom1 = $(this).find('custom1').text();
                var _price = $(this).find('price').text();
                var _deltime = $(this).find('deltime').text();
                var _delcost = $(this).find('delcost').text();
                var _imgurl = $(this).find('imgurl').text();
                var _alternate_image = $(this).find('alternate_image').text();

                $("h2._name").eq(i).text(_name);
                $(".price").eq(i).text(_price);
                var background_url = "url(" + _imgurl + ")";
                $(".panel").eq(i).css("background", background_url);

            } else {

                return false;
            }
        });
        console.log('done reading file');
    }
});

Is there any way the XML file can be read faster so I can render my products more efficiently? 有什么方法可以更快地读取XML文件,从而可以更有效地呈现产品?

PHP has XMLReader/XMLWriter for large XML files. PHP具有用于大型XML文件的XMLReader / XMLWriter。 Your generated XML is not large (depending on the products per page limit). 您生成的XML不大(取决于每页的产品限制)。 So you can use DOM for writing and will only need XMLReader. 因此,您可以使用DOM进行编写,并且只需要XMLReader。

Here is an example with a reduced XML: 这是一个简化XML的示例:

$data = <<<'XML'
<merchant xmlns="http://www.w3.org/2005/Atom" xmlns:g="http://base.google.com/ns/1.0">
  <title>End | Globally Sourced Menswear</title>
  <product><name>Comme des Garcons Play Full Zip Hoody</name></product>
  <product><name>Pharrell: Places &amp; Spaces I've Been - Pink Cover</name></product>
  <product><name>The Rig Out Issue 6</name></product>
  <product><name>Baxter of California Beard Comb</name></product>
  <product><name>Baxter of California Comb</name></product>
</merchant>
XML;

$template = <<<'XML'
<merchant xmlns="http://www.w3.org/2005/Atom" xmlns:g="http://base.google.com/ns/1.0"/>
XML;

$reader = new XMLReader();
$reader->open('data://text/plain;base64,'.base64_encode($data));

// prepare the target document
$document = new DOMDocument();
$document->preserveWhiteSpace = FALSE;
$document->loadXML($template);

// iterate to the first product element
do {
  $found = $reader->read();
} while ($found && $reader->localName !== 'product');

$offset = 0;
$limit = 2;
$end = $offset + $limit;

$i = 0;
while ($found && $i < $end) {
  if ($offset <= $i) {
    // expand the current "product" and append it to the "merchant" node
    $document->documentElement->appendChild($reader->expand($document));
  }
  $i++;
  $found = $reader->next('product');
}

$document->formatOutput = TRUE;
echo $document->saveXML();

Output: 输出:

<?xml version="1.0"?>
<merchant xmlns="http://www.w3.org/2005/Atom" xmlns:g="http://base.google.com/ns/1.0">
  <product>
    <name>Comme des Garcons Play Full Zip Hoody</name>
  </product>
  <product>
    <name>Pharrell: Places &amp; Spaces I've Been - Pink Cover</name>
  </product>
</merchant>

Using the script on the original file for multiple offsets (paging) the duration will increase because XMLReader still has to iterate over the product nodes before the offset is reached. 使用原始文件上的脚本进行多个偏移量(分页)的时间会增加,因为XMLReader仍必须在到达偏移量之前遍历产品节点。 However you could do this in the script that downloads the feed and avoid the work in the requests. 但是,您可以在下载提要的脚本中执行此操作,从而避免了请求中的工作。 Here some results for a 20 product limit on my machine: 这是我的机器上限制20种产品的一些结果:

[Page] => Duration
[1] => 3ms
[51] => 14ms
[101] => 25ms
[151] => 35ms
[201] => 44ms
[251] => 55ms
[301] => 66ms
[351] => 91ms
[401] => 95ms
[451] => 110ms

You should also consider parsing the file (with XMLReader+DOM) into an database (SQLite, ...) or search index (Elastic Search, ...). 您还应该考虑将文件(使用XMLReader + DOM)解析到数据库(SQLite,...)或搜索索引(Elastic Search,...)中。 This would allow you to generate filtered results. 这将允许您生成过滤结果。

PS: btw Your XML file looks broken. PS: btw您的XML文件看起来坏了。 It defines Atom as the default namespace and I can not see any element using the g prefix for defined the Google namespace. 它将Atom定义为默认名称空间,使用g前缀定义的Google名称空间我看不到任何元素。 I would expect merchant and product to be part of that namespace. 我希望merchantproduct成为该名称空间的一部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM