简体   繁体   中英

Puppeteer/JQuery: selectors doesn't work in scrolling script

I am trying to create a basic script to just scroll down to the bottom of the hacker news site. The scrolling implementation was taken from this so question (2nd answer by kimbaudi, 1st method).

The implementation works by constantly measuring the .length of a list of elements (as provided by a selector ) while scrolling, to figure out if the browser has successfully scrolled to the bottom of said list of elements.

For my selector , I chose the HTML element housing each article on hacker news, tr.athing , with the intent of scrolling down to the bottom-most article link. Instead, even though tr.athing as a selector is printable (as seen in the code below), I get the following error:

Error: Error: failed to find element matching selector "tr.athing:last-child"

What is going wrong?

const puppeteer = require("puppeteer");
const cheerio = require('cheerio');

const link = 'https://news.ycombinator.com/';

// 2 functions used in scrolling
async function getCount(page) {
  await console.log(page.$$eval("tr.athing", a => a.length));
  return await page.$$eval("tr.athing", a => a.length);
}

async function scrollDown(page) {
  await page.$eval("tr.athing:last-child", e => {
    e.scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
  });
}


// puppeteer usage as normal
puppeteer.launch({ headless: false }).then(async browser => {

  const page = await browser.newPage();
  const navigationPromise = page.waitForNavigation();
  await page.setViewport({ width: 1500, height: 800 });

  // Loading page
  await page.goto(link);
  await navigationPromise;
  await page.waitFor(1000);

  // Using cheerio to inject jquery into page.
  const html = await page.content();
  const $ = await cheerio.load(html);

  // This works
  var selection = $('tr.athing').text();

  await console.log('\n');
  await console.log(selection);
  await console.log('\n');

  // Error, this does not work for some reason;
  // scrolling code starts here.
  const delay = 10000;
  let preCount = 0;
  let postCount = 0;

  do {
    preCount = getCount(page);
    scrollDown(page);
   page.waitFor(delay);
    postCount = getCount(page);
  } while (postCount > preCount);
      page.waitFor(delay);


//  await browser.close();

})

The last-child selector won't get you the last element but the last element of its parent.

The :last-child selector matches every element that is the last child of its parent.

You could do something like this instead:

async function scrollDown(page) {
  await page.$$eval("tr.athing", els => {
    els[els.length -1].scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
  });
}

Also notice that you have many missing awaits in your code

do {
    preCount = await getCount(page);
    await scrollDown(page);
    await page.waitFor(delay);
    postCount = await getCount(page);
} while (postCount > preCount);
    await page.waitFor(delay);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM