简体   繁体   English

如何使用 Puppeteer 抓取 React 网站?

[英]How to scrape a react Website using Puppeteer?

I am trying to scrape a website which uses react, when in chrome i am able to access the element i want, but when i am trying to use puppeteer i am unable to access the same element as the source code is not loading those react elements, is there any way to access it.我正在尝试抓取一个使用 react 的网站,在 chrome 中我可以访问我想要的元素,但是当我尝试使用 puppeteer 时,我无法访问相同的元素,因为源代码没有加载这些 react 元素, 有什么办法可以访问它。

Here is the code这是代码

const request= require('request-promise')
const cheerio= require('cheerio');
const puppeteer = require("puppeteer");

const Url='https://angel.co/companies';


(async ()=>{

    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();

    await page.goto(Url);
      const html = await page.content();
    // const response=request.get(Url)
    const $ = await cheerio.load(html);




    // const companies=$('div.base.startup').html()
     const companies=await $('div.base.startup > div.company.column > div > div.text > div.pitch').text()
    // const companies=document.querySelectorAll('div.base.startup > div.company.column > div > div.text > div.pitch')

    console.log(companies)
    await browser.close()

} )()

I did something similiar with Python a couple months ago.几个月前我用 Python 做了一些类似的事情。 Basically waiting for an element to appear on the page (as indicator, that React finished loading).基本上是等待一个元素出现在页面上(作为指示器,React 完成加载)。

I looked up Puppeteer's API and found page.waitForSelector , which you could use, I guess.我查找了 Puppeteer 的 API 并找到了page.waitForSelector ,我猜你可以使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM