[英]Puppeteer never completely loads the page
I've been trying to use Puppeteer to scrape a website, but when I try to obtain the screenshot it never loads it either goes to a TimeoutError or just never finishes.我一直在尝试使用 Puppeteer 来抓取网站,但是当我尝试获取屏幕截图时,它永远不会加载它,要么进入 TimeoutError 要么永远不会完成。
(async () => {
try{
const navegador = await puppeteer.launch({headless: false},{defaultViewport: null});
const pagina = await navegador.newPage();
await pagina.setDefaultNavigationTimeout(3000);
await pagina.goto(urlSitio, {waitUntil: 'load'});
await pagina.setViewport({width: 1920, height: 1080});
await pagina.waitForNavigation({waitUntil: 'load'});
await pagina.screenshot({
fullPage: true,
path: `temporales/temporal.png`
});
await navegador.close();
}catch(err){
console.log(err);
}
})();
I've tried to set await pagina.setDefaultNavigationTimeout(3000);
我试图设置
await pagina.setDefaultNavigationTimeout(3000);
to 0 and multiple other numbers.到 0 和多个其他数字。
I've tried removing headless: false
.我试过删除
headless: false
。
I've also tried putting all the different options for我也试过把所有不同的选项
await pagina.waitForNavigation({waitUntil: 'load'});
The website example I'm using is https://www.xtract.io/我使用的网站示例是https://www.xtract.io/
Error message:错误信息:
(node:9644) UnhandledPromiseRejectionWarning: TimeoutError: Navigation timeout of 3000 ms exceeded
at C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\LifecycleWatcher.js:106:111
at async FrameManager.navigateFrame (C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:90:21)
at async Frame.goto (C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:416:16)
at async Page.goto (C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\Page.js:789:16)
at async C:\Users\Samuel\Desktop\somnus-monitor\back\index.js:103:9
(Use `node --trace-warnings ...` to show where the warning was created)
(node:9644) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:9644) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
There appears to be an unnecessary waitForNavigation
call here.这里似乎有一个不必要的
waitForNavigation
调用。 Since you already waited until page load, waiting for another navigation that never occurs is going to cause a timeout.由于您已经等到页面加载,等待另一个永远不会发生的导航将导致超时。 Re-add the commented-out line below to reproduce your problem.
重新添加下面的注释行以重现您的问题。
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
});
try {
const [page] = await browser.pages();
await page.setViewport({width: 1920, height: 1080});
await page.goto("https://www.xtract.io/", {waitUntil: "load"});
//await page.waitForNavigation({waitUntil: "load"}); // this will timeout
await page.screenshot({
fullPage: true,
path: "temporal.png",
});
}
catch (err) {
console.error(err);
}
await browser.close();
})();
As an aside, I don't think you meant to pass multiple objects to puppeteer.launch
.顺便说一句,我不认为您打算将多个对象传递给
puppeteer.launch
。 Just add all of the settings to a single object as the second argument as shown above.只需将所有设置添加到单个 object 作为第二个参数,如上所示。
i have the same question, I refer to this website to solve.我有同样的问题,我参考这个网站来解决。
await page.goto('https://ourcodeworld.com', { waitUntil: 'load', // Remove the timeout timeout: 0 });
I would wait for a selector and not waste time waiting for the all page to load.我会等待选择器,而不是浪费时间等待所有页面加载。 instead, use
page.waitForSelector('#myId')
Waiting for all the pages to load can take time instead you can wait only for what you need and then take a screenshot.相反,使用
page.waitForSelector('#myId')
等待所有页面加载可能需要一些时间,而您可以只等待您需要的内容,然后截取屏幕截图。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.