简体   繁体   中英

Is it possible to scrape data from multiple websites with Node.js?

I have a mobile app barcode scanner created with JavaScript and want to use the UPC code in a web scraper to get information on the product.

The scraper can get the title of a video game from a UPC database currently.

const rp = require('request-promise');
const cheerio = require('cheerio');

const options = {
    uri: `https://barcodeindex.com/upc/722674120708/`, 

UPC barcode should be entered in URL instead of long number which is just a test code.

    transform: function (body) {
        return cheerio.load(body);
    }
};

rp(options)
    .then(($) => {
        console.log($('#item-sub-title').text());
    })
    .catch((err) => {
        console.log(err);
    });

If I wanted to scrape the title of the video game and then use that title to scrape Metacritic.com for information on the video game, how would I do this? Or is it even possible?

Yes, it's possible. You would use an http client library like request and use it like this:

const request = require('request')
request('url.com', (error, response, body) => {
  if (error) throw error
  if (response && response.statusCode === 200) {
    // Here we call your findVideogameTitle function, which searches for the
    // videogame title enclosing tag and extracts the element text.
    console.log(findVideogameTitle(body))
  } else {
    console.log(`Something happened: ${response.statusCode}`)
  }
})

If the scraped page is lazy loaded and not server rendered, you may need a full headless browser for the task, like puppeteer . It's quite easy to use, but will take much more resources from CPU and memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM