是否可以使用Node.js从多个网站抓取数据？

Question

I have a mobile app barcode scanner created with JavaScript and want to use the UPC code in a web scraper to get information on the product. 我有一个使用JavaScript创建的移动应用程序条形码扫描仪，并且想要在网络抓取工具中使用UPC代码来获取有关该产品的信息。

The scraper can get the title of a video game from a UPC database currently. 目前，抓取工具可以从UPC数据库中获取视频游戏的标题。

const rp = require('request-promise');
const cheerio = require('cheerio');

const options = {
    uri: `https://barcodeindex.com/upc/722674120708/`,

UPC barcode should be entered in URL instead of long number which is just a test code. UPC条形码应在URL中输入，而不是仅是测试代码的长号。

    transform: function (body) {
        return cheerio.load(body);
    }
};

rp(options)
    .then(($) => {
        console.log($('#item-sub-title').text());
    })
    .catch((err) => {
        console.log(err);
    });

If I wanted to scrape the title of the video game and then use that title to scrape Metacritic.com for information on the video game, how would I do this? 如果我想抓取视频游戏的标题，然后使用该标题抓取Metacritic.com以获取有关视频游戏的信息，我该怎么做？ Or is it even possible? 甚至有可能吗？

Answer 1

Yes, it's possible. 是的，有可能。 You would use an http client library like request and use it like this: 您将使用诸如request之类的http客户端库，并按以下方式使用它：

const request = require('request')
request('url.com', (error, response, body) => {
  if (error) throw error
  if (response && response.statusCode === 200) {
    // Here we call your findVideogameTitle function, which searches for the
    // videogame title enclosing tag and extracts the element text.
    console.log(findVideogameTitle(body))
  } else {
    console.log(`Something happened: ${response.statusCode}`)
  }
})

If the scraped page is lazy loaded and not server rendered, you may need a full headless browser for the task, like puppeteer . 如果抓取的页面是延迟加载的，而不是服务器呈现的，则可能需要完整的无头浏览器来完成任务，例如puppeteer 。 It's quite easy to use, but will take much more resources from CPU and memory. 它很容易使用，但是会占用CPU和内存更多的资源。

是否可以使用Node.js从多个网站抓取数据？

问题描述

1 个解决方案

解决方案1
0 2018-02-13 21:09:34

是否可以使用Node.js从多个网站抓取数据？

问题描述

1 个解决方案

解决方案1 0 2018-02-13 21:09:34

解决方案1
0 2018-02-13 21:09:34