简体   繁体   中英

How to click an array of links in puppeteer?

I'm new to puppeteer, trying to understand how it works by writing a simple scraping job.

What I plan to do

Plan is simple:

  1. goto a page,
  2. then extract all <li> links under a <ul> tag
  3. click each <li> link and take a screenshot of the target page.

How I implement it

Code goes as follows,

  await page.goto('http://some.url.com');                 // step-1
  const a_elems = await page.$$('li.some_css_class a');   // step-2

  for (var i=0; i<a_elems.length; i++) {                  // step-3
    const elem = a_elems[i];
    await Promise.all([
      elem.click(),
      page.waitForNavigation({waitUntil: 'networkidle0'})   // click each link and wait page loading
    ]);
    await page.screenshot({path: `${IMG_FOLDER}/${txt}.png`});

    await page.goBack({waitUntil: 'networkidle0'});      // go back to previous page so that we could click next link
    console.log(`clicked link = ${txt}`);
  }

What is wrong & Need help

However, the above code only could do with the first link in a_elems , and when the for-loop comes to the 2nd link, the code breaks with error saying

(node:40606) UnhandledPromiseRejectionWarning: Error: Node is detached from document
    at ElementHandle._scrollIntoViewIfNeeded (.../.npm-packages/lib/node_modules/puppeteer/lib/JSHandle.js:203:13)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async ElementHandle.click (.../.npm-packages/lib/node_modules/puppeteer/lib/JSHandle.js:282:5)
    at async Promise.all (index 0)
    at async main (.../test.js:34:5)
  -- ASYNC --
    at ElementHandle.<anonymous> (.../.npm-packages/lib/node_modules/puppeteer/lib/helper.js:111:15)
    at main (.../test.js:35:12)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)

I suspect that the execution context of page has already changed after the first link is clicked, even though I called page.goBack to previous page, but it doesn't give me the previous execution context.

Not sure if my speculation is right or wrong, and couldn't find any similar issue out there, hope I could get some help here, thanks!

If there could be even better implementation to achieve my plan, please let me know.

You are right about the elements losing its context when you goBack . That's not going to work.
But, as you commented, you can grab the href from the element and start from there:


for (var i=0; i<a_elems.length; i++) {                  // step-3
  const elem = a_elems[i];
  const href = await page.evaluate(e => e.href, elem); //Chrome will return the absolute URL
  const newPage = await browser.newPage();
  await newPage.goto(href);
  await newPage.screenshot({path: `${IMG_FOLDER}/${txt}.png`});
  await newPage.close();
  console.log(`clicked link = ${txt}`);
}

You could even do this in parallel, although there is an internal queue for screenshots.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM