简体   繁体   中英

How to scrape a react Website using Puppeteer?

I am trying to scrape a website which uses react, when in chrome i am able to access the element i want, but when i am trying to use puppeteer i am unable to access the same element as the source code is not loading those react elements, is there any way to access it.

Here is the code

const request= require('request-promise')
const cheerio= require('cheerio');
const puppeteer = require("puppeteer");

const Url='https://angel.co/companies';


(async ()=>{

    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();

    await page.goto(Url);
      const html = await page.content();
    // const response=request.get(Url)
    const $ = await cheerio.load(html);




    // const companies=$('div.base.startup').html()
     const companies=await $('div.base.startup > div.company.column > div > div.text > div.pitch').text()
    // const companies=document.querySelectorAll('div.base.startup > div.company.column > div > div.text > div.pitch')

    console.log(companies)
    await browser.close()

} )()

I did something similiar with Python a couple months ago. Basically waiting for an element to appear on the page (as indicator, that React finished loading).

I looked up Puppeteer's API and found page.waitForSelector , which you could use, I guess.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM