简体   繁体   English

在尝试抓取 React 网站时获取 index.html 内容

[英]Getting index.html content while trying to scrape a react website

when i try to scrape a reactjs website using nodejs i am getting the content of index.html file only not the tags that were used in the website.当我尝试使用 nodejs 抓取 reactjs 网站时,我得到的只是 index.html 文件的内容,而不是网站中使用的标签。 Here is what i have tried -这是我尝试过的-

    const request = require("request");
    const cheerio = require("cheerio");

    const URL = "https://pydata-jal.netlify.com/";

    request(URL, (err, res, body) => {
      if (!err && res.statusCode == 200) {
        const $ = cheerio.load(body);
        console.log($.html());
      }
    });

What should i do to get the whole of tags that were used in react website.我应该怎么做才能获得 React 网站中使用的全部标签。

And do tell i can scrape the hackernoon website ?并且告诉我可以抓取hackernoon网站吗? (for just example) if its legal? (例如)是否合法?

Cheerio 只解析已经渲染的 HTML(例如:静态 HTML)为了获得 React 渲染,你应该依赖于像Puppeteer这样的工具控制的无头浏览器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM