简体   繁体   中英

Scrape web with x-ray

I'm using x-ray to extract some data from a web site but when I get to the point to crawl to another page using the built-in functionality, it simply doesn't work.

UnitPrice is the parameter I want to extract but I get " undefined " all the time.

As you can see, I'm passing the href value previously extracted on the url property.

var Xray = require('x-ray');
var x = Xray();
var x = Xray({
  filters: {
    cleanPrice: function (value) {
      return typeof value === 'string' ? value.replace(/\r|\t|\n|€/g, "").trim() : value
    },
    whiteSpaces: function (value) {
      return typeof value === 'string' ? value.replace(/ +/g, ' ').trim() : value
    }
  }
});

x('https://www.simply.es/compra-online/aceite-vinagre-y-sal.html',
  '#content > ul',
  [{
    name: '.descripcionProducto | whiteSpaces',
    categoryId: 'input[name="idCategoria"]@value',
    productId: 'input[name="idProducto"]@value',
    url: 'li a@href',
    price: 'span | cleanPrice',
    image: '.miniaturaProducto@src',
    unitPrice: x('li a@href', '.precioKilo')
  }])
  .paginate('.link@href')
  .limit(1)
  // .delay(500, 1000)
  // .throttle(2, 1000)
  .write('results.json')

There's a pull request to fix this. Meanwhile you can use the solution which is just one line of code. See this:

https://github.com/lapwinglabs/x-ray/pull/181

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM