在尝试抓取 React 网站时获取 index.html 内容

Question

when i try to scrape a reactjs website using nodejs i am getting the content of index.html file only not the tags that were used in the website.当我尝试使用 nodejs 抓取 reactjs 网站时，我得到的只是 index.html 文件的内容，而不是网站中使用的标签。 Here is what i have tried -这是我尝试过的-

    const request = require("request");
    const cheerio = require("cheerio");

    const URL = "https://pydata-jal.netlify.com/";

    request(URL, (err, res, body) => {
      if (!err && res.statusCode == 200) {
        const $ = cheerio.load(body);
        console.log($.html());
      }
    });

What should i do to get the whole of tags that were used in react website.我应该怎么做才能获得 React 网站中使用的全部标签。

And do tell i can scrape the hackernoon website ?并且告诉我可以抓取hackernoon网站吗？ (for just example) if its legal? （例如）是否合法？

Answer 1

Cheerio 只解析已经渲染的 HTML（例如：静态 HTML）为了获得 React 渲染，你应该依赖于像Puppeteer这样的工具控制的无头浏览器

在尝试抓取 React 网站时获取 index.html 内容

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-08-01 14:47:35

在尝试抓取 React 网站时获取 index.html 内容

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-08-01 14:47:35

解决方案1
0 已采纳 2019-08-01 14:47:35