简体   繁体   中英

How to select a DOM Element to scroll on it in Puppeteer

I'm quite new to Puppeteer and await/async syntax. I am trying to build a bot to try to get data from Instagram. Specifically I would like to get the followers for a given profile. Everything works fine until the window of followers pops up. I would like to select the DOM Element to scroll on it and push the followers in an array at each iteration. I've searched through the forum and tried different approaches but it always return undefined. I'm able to get an ElementHandle (scrollBox3) and to get properties like scrollHeight but not the actual DOM Element. The code is below with descriptions for different parts of the file.

Any help would be appreciated :)

The next part selects DOM Elements. CRED File is where my username and password are.

const puppeteer = require('puppeteer');
const CREDS = require('./creds');

// Dom Elements
const loginPage = 'https://www.instagram.com/accounts/login/';
const usernameInput = 'input[name="username"]';
const passwordInput = 'input[name="password"]';
const submitButton = 'button[type="submit"]';
const userToSearch = 'nicolekidman';
const searchUser = `https://www.instagram.com/${userToSearch}`;
const followers = `a[href='/${userToSearch}/followers/']`;

This part records followers visible in the scrollBox in an array.

// Extract followers from a user profile
const extractFollowers = () => {
  let followers = [];
  let elements = document.getElementsByClassName('FPmhX notranslate _0imsa ');
  for (let element of elements)
      followers.push(element.textContent);
  return followers;
}

This is the scroll function where the code breaks. Basically I want to loop and scroll on this scrollBox but I'm unable to grab the DOM Element.

// Scrolling Function
async function scrapeInfiniteScrollItems(
  page,
  extractFollowers,
  followersTargetCount,
  scrollDelay = 1000,
) {
  let items = [];
  // Next 2 lines return undefined
  // .isgrP and .PZuss are classes inside this div, PZuss is the one we want to scroll on
  let scrollBox1 = await page.$eval('.isgrP', el => el.querySelector('body > div:nth-child(15) > div > div > div.isgrP > ul > div'));
  let scrollBox2 = await page.$eval('body > div:nth-child(15) > div > div > div.isgrP > ul > div', el => el);

  // Next line returns an ElementHandle
  let scrollBox3 = await page.$('.PZuss');

  console.log(scrollBox3);
  let scrollBoxHeight = await page.$eval('.PZuss', el => el.scrollHeight);
  console.log(scrollBoxHeight);
  try {
    while (items.length < followersTargetCount) {
      items = await page.evaluate(extractFollowers);
      console.log(extractFollowers());
      // await page.evaluate('scrollBox.scrollTo(0, scrollable_popup.scrollHeight)');
      // await page.waitForFunction(`scrollBox.scrollHeight > ${previousHeight}`);
      // await page.waitFor(scrollDelay);
    }
  } catch(e) { }
  return items;
}

This is the actual async function where I'm doing all the work to access Instagram and call the scroll function to record followers for a given profile.

(async() => {
  // headless false for visual debugging in browser
  const browser = await puppeteer.launch({
    headless: false
  });
  const page = await browser.newPage();
  await page.goto(loginPage, {waitUntil: 'networkidle2'});
  // Type username
  await page.click(usernameInput);
  await page.keyboard.type(CREDS.username);

  // Type password and submit
  await page.click(passwordInput);
  await page.keyboard.type(CREDS.password);
  await page.click(submitButton);
  await page.waitFor(2000);

  // Search User with URL
  await page.goto(searchUser);
  await page.click(followers);
  await page.waitFor(2000);

  const findFollowers = await scrapeInfiniteScrollItems(page, extractFollowers, 100);
  console.log(findFollowers);
  await page.screenshot({ path: '../screenshots/insta.png' });

  // await browser.close();
})();

I got over the issue using the .hover() method. I select the last element in the div at each iteration which triggers a scroll into view. This way I'm able to get the number of followers defined as a parameter. It's convenient and the function is shorter this way. Still not able to select the DOM Element itself though.

async function scrapeInfiniteScrollItems(
  page,
  extractFollowers,
  followersTargetCount
) {
  let items = [];
  // Next line returns undefined
  let x;
  try {
    while (items.length < followersTargetCount) {
      items = await page.evaluate(extractFollowers);
      childToSelect = items.length;
      await page.hover(`div.isgrP > ul > div > li:nth-child(${childToSelect})`);
    }
  } catch(e) { }
  items.length = followersTargetCount;
  return items;
}

Right I'm not familiar with Instagram but I'm going to try and work with you step by step on this. You don't have much wrong with your code at a glance (I have no way of testing this code unfortunately as I'm not signed up with Instagram) but there are a few things that stand out.

scrapeInfiniteScrollItems function:

let scrollBox1 = await page.$eval('.isgrP', el => el.querySelector('body > div:nth-child(15) > div > div > div.isgrP > ul > div'));
let scrollBox2 = await page.$eval('body > div:nth-child(15) > div > div > div.isgrP > ul > div', el => el);

You point out that both of these lines return undefined. This is because you're not quite using the $eval method correctly. What the $eval method allows you to do is to execute a querySelector instruction to locate a specific DOM element (which matches to the CSS selector you have declared) and then the internal function executes JavaScript instructions in real time on that DOM element.

So lets look at your first line: you're asking it to do a querySelector for an element with class isgrP but then you're running a further querySelector on that element which uses a CSS selector that begins with body ? This doesn't make sense.

I also see from that strange selector that it ends with div.isgrP > ul > div which, coincidentally, has a div with the same class name as the one you originally queried with the $eval method. So did you always intend on finding the element at div.isgrP > ul > div ?

You can access the DOM element directly using puppeteer by reworking your code as follows:

const scrollBox = await page.$eval('div.isgrP > ul > div.PZuss', (uiElement) => {
  return uiElement;
});

This will return your DOM element (not the ElementHandle instance) for the scrollable box which you have been searching for.

Please let me know if this helps and what is causing your next issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM