简体   繁体   中英

How to process large XML files with jQuery/Javascript/PHP faster

I'm making a store overview page that renders +- 20 products per page. I'm getting my data from a zipped (gzip) XML file (*.xml.gz). Here's the feed: http://www.endclothing.com/media/end_feeds/awin_affiliates_eu.xml.gz Once a day I download the file to my server with PHP and extract the XML file.

The problem is, the XML file unzipped is +- 60MB and contains over 50k products. Now when i try to get products and display them from the XML file, this goes very slowly. It takes about 8 seconds to display product information from a local XML with the code I use below:

$.ajax({
    type: "GET",
    url: 'feeds/awin_affiliates_eu.xml',
    cache: true,
    dataType: "xml",

    error: function (response) {
        alert("An error occurred while processing XML file");
        console.log('XML reading Failed: ', e);
    },

    success: function (response) {
        var max = 20;
        $(response).find("product").each(function (i) {

            if (i < max) {

                var _pid = $(this).find('pid').text();
                var _mpn = $(this).find('mpn').text();
                var _colour = $(this).find('colour').text();
                var _name = $(this).find('name').text();
                var _purl = $(this).find('purl').text();
                var _instock = $(this).find('instock').text();
                var _brand = $(this).find('brand').text();
                var _suitable_for = $(this).find('suitable_for').text();
                var _ptype = $(this).find('ptype').text();
                var _category = $(this).find('category').text();
                var _condition = $(this).find('condition').text();
                var _desc = $(this).find('desc').text();
                var _currency = $(this).find('currency').text();
                var _custom1 = $(this).find('custom1').text();
                var _price = $(this).find('price').text();
                var _deltime = $(this).find('deltime').text();
                var _delcost = $(this).find('delcost').text();
                var _imgurl = $(this).find('imgurl').text();
                var _alternate_image = $(this).find('alternate_image').text();

                $("h2._name").eq(i).text(_name);
                $(".price").eq(i).text(_price);
                var background_url = "url(" + _imgurl + ")";
                $(".panel").eq(i).css("background", background_url);

            } else {

                return false;
            }
        });
        console.log('done reading file');
    }
});

Is there any way the XML file can be read faster so I can render my products more efficiently?

PHP has XMLReader/XMLWriter for large XML files. Your generated XML is not large (depending on the products per page limit). So you can use DOM for writing and will only need XMLReader.

Here is an example with a reduced XML:

$data = <<<'XML'
<merchant xmlns="http://www.w3.org/2005/Atom" xmlns:g="http://base.google.com/ns/1.0">
  <title>End | Globally Sourced Menswear</title>
  <product><name>Comme des Garcons Play Full Zip Hoody</name></product>
  <product><name>Pharrell: Places &amp; Spaces I've Been - Pink Cover</name></product>
  <product><name>The Rig Out Issue 6</name></product>
  <product><name>Baxter of California Beard Comb</name></product>
  <product><name>Baxter of California Comb</name></product>
</merchant>
XML;

$template = <<<'XML'
<merchant xmlns="http://www.w3.org/2005/Atom" xmlns:g="http://base.google.com/ns/1.0"/>
XML;

$reader = new XMLReader();
$reader->open('data://text/plain;base64,'.base64_encode($data));

// prepare the target document
$document = new DOMDocument();
$document->preserveWhiteSpace = FALSE;
$document->loadXML($template);

// iterate to the first product element
do {
  $found = $reader->read();
} while ($found && $reader->localName !== 'product');

$offset = 0;
$limit = 2;
$end = $offset + $limit;

$i = 0;
while ($found && $i < $end) {
  if ($offset <= $i) {
    // expand the current "product" and append it to the "merchant" node
    $document->documentElement->appendChild($reader->expand($document));
  }
  $i++;
  $found = $reader->next('product');
}

$document->formatOutput = TRUE;
echo $document->saveXML();

Output:

<?xml version="1.0"?>
<merchant xmlns="http://www.w3.org/2005/Atom" xmlns:g="http://base.google.com/ns/1.0">
  <product>
    <name>Comme des Garcons Play Full Zip Hoody</name>
  </product>
  <product>
    <name>Pharrell: Places &amp; Spaces I've Been - Pink Cover</name>
  </product>
</merchant>

Using the script on the original file for multiple offsets (paging) the duration will increase because XMLReader still has to iterate over the product nodes before the offset is reached. However you could do this in the script that downloads the feed and avoid the work in the requests. Here some results for a 20 product limit on my machine:

[Page] => Duration
[1] => 3ms
[51] => 14ms
[101] => 25ms
[151] => 35ms
[201] => 44ms
[251] => 55ms
[301] => 66ms
[351] => 91ms
[401] => 95ms
[451] => 110ms

You should also consider parsing the file (with XMLReader+DOM) into an database (SQLite, ...) or search index (Elastic Search, ...). This would allow you to generate filtered results.

PS: btw Your XML file looks broken. It defines Atom as the default namespace and I can not see any element using the g prefix for defined the Google namespace. I would expect merchant and product to be part of that namespace.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM