简体   繁体   中英

How to scrape an image src using puppeteer in NodeJS?

I'm trying to scrape the source of the first image with a specific class. On the page, there are multiple images with different additional classes but they share the class opwvks06 . I have tried the following:

(async () => {
  let browser, page;
  let url = 'https://www.facebook.com/radiosalue/photos/?ref=page_internal';

  try {
    browser = await puppeteer.launch({ headless: true });
    page = await browser.newPage();
    await page.setViewport({ width: 1366, height: 500 });
    await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });

    const image = await page.evaluate(() => {
      const getImage = document
        .querySelector('img[class="opwvks06"]')
        .getAttribute('src');
      return getImage;
    });

    console.log(image);
  } catch (error) {
    console.log(error.message);
  } finally {
    if (browser) {
      await browser.close();
      console.log('closing browser');
    }
  }
})();

However, this returns null. Following is the html structure. 在此处输入图像描述

To the answer Mike 'Pomax' Kamermans all you had to do was add:

await page.waitForSelector("img.opwvks06:first-child");

You can also try using Stealth Puppeteer if the site is protected from bots, but in your case it is not necessary. Here is the final code:

  (async () => {
    let browser, page;
    let url = "https://www.facebook.com/radiosalue/photos/?ref=page_internal";

    try {
      browser = await puppeteer.launch({ headless: true });
      page = await browser.newPage();
      await page.setViewport({ width: 1366, height: 500 });
      await page.goto(url, { waitUntil: "domcontentloaded", timeout: 60000 });
      await page.waitForSelector("img.opwvks06:first-child");

      const image = await page.evaluate(() => {
        const getImage = document.querySelector("img.opwvks06:first-child").getAttribute("src");
        return getImage;
      });

      console.log(image);
    } catch (error) {
      console.log(error.message);
    } finally {
      if (browser) {
        await browser.close();
        console.log("closing browser");
      }
    }
  })();

Output:

https://scontent.fiev13-1.fna.fbcdn.net/v/t39.30808-6/279856934_10159266106247585_585375152905621309_n.jpg?stp=dst-jpg_p206x206&_nc_cat=106&ccb=1-6&_nc_sid=8024bb&_nc_ohc=owbdAyQwP3wAX-8rdo5&_nc_ht=scontent.fiev13-1.fna&oh=00_AT8yJizEIWx8oEFLUBb90ZIIj-Q4WLmmiWtpd1aRVy-UkA&oe=627C10A5
closing browser

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM