[英]How to scrape an image src using puppeteer in NodeJS?
我正在嘗試使用特定的 class 抓取第一張圖像的來源。在頁面上,有多個具有不同附加類的圖像,但它們共享 class opwvks06
。 我嘗試了以下方法:
(async () => {
let browser, page;
let url = 'https://www.facebook.com/radiosalue/photos/?ref=page_internal';
try {
browser = await puppeteer.launch({ headless: true });
page = await browser.newPage();
await page.setViewport({ width: 1366, height: 500 });
await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 60000 });
const image = await page.evaluate(() => {
const getImage = document
.querySelector('img[class="opwvks06"]')
.getAttribute('src');
return getImage;
});
console.log(image);
} catch (error) {
console.log(error.message);
} finally {
if (browser) {
await browser.close();
console.log('closing browser');
}
}
})();
對於 Mike 'Pomax' Kamermans 的回答,您只需添加:
await page.waitForSelector("img.opwvks06:first-child");
如果站點受到機器人程序保護,您也可以嘗試使用 Stealth Puppeteer,但在您的情況下沒有必要。 這是最終代碼:
(async () => {
let browser, page;
let url = "https://www.facebook.com/radiosalue/photos/?ref=page_internal";
try {
browser = await puppeteer.launch({ headless: true });
page = await browser.newPage();
await page.setViewport({ width: 1366, height: 500 });
await page.goto(url, { waitUntil: "domcontentloaded", timeout: 60000 });
await page.waitForSelector("img.opwvks06:first-child");
const image = await page.evaluate(() => {
const getImage = document.querySelector("img.opwvks06:first-child").getAttribute("src");
return getImage;
});
console.log(image);
} catch (error) {
console.log(error.message);
} finally {
if (browser) {
await browser.close();
console.log("closing browser");
}
}
})();
Output:
https://scontent.fiev13-1.fna.fbcdn.net/v/t39.30808-6/279856934_10159266106247585_585375152905621309_n.jpg?stp=dst-jpg_p206x206&_nc_cat=106&ccb=1-6&_nc_sid=8024bb&_nc_ohc=owbdAyQwP3wAX-8rdo5&_nc_ht=scontent.fiev13-1.fna&oh=00_AT8yJizEIWx8oEFLUBb90ZIIj-Q4WLmmiWtpd1aRVy-UkA&oe=627C10A5
closing browser
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.