简体   繁体   English

Puppeteer 永远不会完全加载页面

[英]Puppeteer never completely loads the page

I've been trying to use Puppeteer to scrape a website, but when I try to obtain the screenshot it never loads it either goes to a TimeoutError or just never finishes.我一直在尝试使用 Puppeteer 来抓取网站,但是当我尝试获取屏幕截图时,它永远不会加载它,要么进入 TimeoutError 要么永远不会完成。

(async () => {
        try{
        const navegador = await puppeteer.launch({headless: false},{defaultViewport: null});
        const pagina = await navegador.newPage();
        await pagina.setDefaultNavigationTimeout(3000);
        await pagina.goto(urlSitio, {waitUntil: 'load'});
        await pagina.setViewport({width: 1920, height: 1080});
        await pagina.waitForNavigation({waitUntil: 'load'});
        await pagina.screenshot({
            fullPage: true,
            path: `temporales/temporal.png`
        });
        await navegador.close();
        }catch(err){
            console.log(err);
        }
    })();

I've tried to set await pagina.setDefaultNavigationTimeout(3000);我试图设置await pagina.setDefaultNavigationTimeout(3000); to 0 and multiple other numbers.到 0 和多个其他数字。

I've tried removing headless: false .我试过删除headless: false

I've also tried putting all the different options for我也试过把所有不同的选项

await pagina.waitForNavigation({waitUntil: 'load'});

The website example I'm using is https://www.xtract.io/我使用的网站示例是https://www.xtract.io/

Error message:错误信息:

(node:9644) UnhandledPromiseRejectionWarning: TimeoutError: Navigation timeout of 3000 ms exceeded
    at C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\LifecycleWatcher.js:106:111
    at async FrameManager.navigateFrame (C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:90:21)
    at async Frame.goto (C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\FrameManager.js:416:16)
    at async Page.goto (C:\Users\Samuel\Desktop\somnus-monitor\back\node_modules\puppeteer\lib\cjs\puppeteer\common\Page.js:789:16)
    at async C:\Users\Samuel\Desktop\somnus-monitor\back\index.js:103:9
(Use `node --trace-warnings ...` to show where the warning was created)
(node:9644) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:9644) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

There appears to be an unnecessary waitForNavigation call here.这里似乎有一个不必要的waitForNavigation调用。 Since you already waited until page load, waiting for another navigation that never occurs is going to cause a timeout.由于您已经等到页面加载,等待另一个永远不会发生的导航将导致超时。 Re-add the commented-out line below to reproduce your problem.重新添加下面的注释行以重现您的问题。

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({
    headless: false, 
    defaultViewport: null,
  });

  try {
    const [page] = await browser.pages();
    await page.setViewport({width: 1920, height: 1080});
    await page.goto("https://www.xtract.io/", {waitUntil: "load"});
    //await page.waitForNavigation({waitUntil: "load"}); // this will timeout
    await page.screenshot({
      fullPage: true,
      path: "temporal.png",
    });
  }
  catch (err) {
    console.error(err);
  }

  await browser.close();
})();

As an aside, I don't think you meant to pass multiple objects to puppeteer.launch .顺便说一句,我不认为您打算将多个对象传递给puppeteer.launch Just add all of the settings to a single object as the second argument as shown above.只需将所有设置添加到单个 object 作为第二个参数,如上所示。

i have the same question, I refer to this website to solve.我有同样的问题,我参考这个网站来解决。

slove this question 爱这个问题

 await page.goto('https://ourcodeworld.com', { waitUntil: 'load', // Remove the timeout timeout: 0 });

I would wait for a selector and not waste time waiting for the all page to load.我会等待选择器,而不是浪费时间等待所有页面加载。 instead, use page.waitForSelector('#myId') Waiting for all the pages to load can take time instead you can wait only for what you need and then take a screenshot.相反,使用page.waitForSelector('#myId')等待所有页面加载可能需要一些时间,而您可以只等待您需要的内容,然后截取屏幕截图。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM