I'm quite new to Puppeteer and await/async syntax. I am trying to build a bot to try to get data from Instagram. Specifically I would like to get the followers for a given profile. Everything works fine until the window of followers pops up. I would like to select the DOM Element to scroll on it and push the followers in an array at each iteration. I've searched through the forum and tried different approaches but it always return undefined. I'm able to get an ElementHandle (scrollBox3) and to get properties like scrollHeight but not the actual DOM Element. The code is below with descriptions for different parts of the file.
Any help would be appreciated :)
The next part selects DOM Elements. CRED File is where my username and password are.
const puppeteer = require('puppeteer');
const CREDS = require('./creds');
// Dom Elements
const loginPage = 'https://www.instagram.com/accounts/login/';
const usernameInput = 'input[name="username"]';
const passwordInput = 'input[name="password"]';
const submitButton = 'button[type="submit"]';
const userToSearch = 'nicolekidman';
const searchUser = `https://www.instagram.com/${userToSearch}`;
const followers = `a[href='/${userToSearch}/followers/']`;
This part records followers visible in the scrollBox in an array.
// Extract followers from a user profile
const extractFollowers = () => {
let followers = [];
let elements = document.getElementsByClassName('FPmhX notranslate _0imsa ');
for (let element of elements)
followers.push(element.textContent);
return followers;
}
This is the scroll function where the code breaks. Basically I want to loop and scroll on this scrollBox but I'm unable to grab the DOM Element.
// Scrolling Function
async function scrapeInfiniteScrollItems(
page,
extractFollowers,
followersTargetCount,
scrollDelay = 1000,
) {
let items = [];
// Next 2 lines return undefined
// .isgrP and .PZuss are classes inside this div, PZuss is the one we want to scroll on
let scrollBox1 = await page.$eval('.isgrP', el => el.querySelector('body > div:nth-child(15) > div > div > div.isgrP > ul > div'));
let scrollBox2 = await page.$eval('body > div:nth-child(15) > div > div > div.isgrP > ul > div', el => el);
// Next line returns an ElementHandle
let scrollBox3 = await page.$('.PZuss');
console.log(scrollBox3);
let scrollBoxHeight = await page.$eval('.PZuss', el => el.scrollHeight);
console.log(scrollBoxHeight);
try {
while (items.length < followersTargetCount) {
items = await page.evaluate(extractFollowers);
console.log(extractFollowers());
// await page.evaluate('scrollBox.scrollTo(0, scrollable_popup.scrollHeight)');
// await page.waitForFunction(`scrollBox.scrollHeight > ${previousHeight}`);
// await page.waitFor(scrollDelay);
}
} catch(e) { }
return items;
}
This is the actual async function where I'm doing all the work to access Instagram and call the scroll function to record followers for a given profile.
(async() => {
// headless false for visual debugging in browser
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto(loginPage, {waitUntil: 'networkidle2'});
// Type username
await page.click(usernameInput);
await page.keyboard.type(CREDS.username);
// Type password and submit
await page.click(passwordInput);
await page.keyboard.type(CREDS.password);
await page.click(submitButton);
await page.waitFor(2000);
// Search User with URL
await page.goto(searchUser);
await page.click(followers);
await page.waitFor(2000);
const findFollowers = await scrapeInfiniteScrollItems(page, extractFollowers, 100);
console.log(findFollowers);
await page.screenshot({ path: '../screenshots/insta.png' });
// await browser.close();
})();
I got over the issue using the .hover() method. I select the last element in the div at each iteration which triggers a scroll into view. This way I'm able to get the number of followers defined as a parameter. It's convenient and the function is shorter this way. Still not able to select the DOM Element itself though.
async function scrapeInfiniteScrollItems(
page,
extractFollowers,
followersTargetCount
) {
let items = [];
// Next line returns undefined
let x;
try {
while (items.length < followersTargetCount) {
items = await page.evaluate(extractFollowers);
childToSelect = items.length;
await page.hover(`div.isgrP > ul > div > li:nth-child(${childToSelect})`);
}
} catch(e) { }
items.length = followersTargetCount;
return items;
}
Right I'm not familiar with Instagram but I'm going to try and work with you step by step on this. You don't have much wrong with your code at a glance (I have no way of testing this code unfortunately as I'm not signed up with Instagram) but there are a few things that stand out.
scrapeInfiniteScrollItems
function:
let scrollBox1 = await page.$eval('.isgrP', el => el.querySelector('body > div:nth-child(15) > div > div > div.isgrP > ul > div'));
let scrollBox2 = await page.$eval('body > div:nth-child(15) > div > div > div.isgrP > ul > div', el => el);
You point out that both of these lines return undefined. This is because you're not quite using the $eval
method correctly. What the $eval
method allows you to do is to execute a querySelector
instruction to locate a specific DOM element (which matches to the CSS selector you have declared) and then the internal function executes JavaScript instructions in real time on that DOM element.
So lets look at your first line: you're asking it to do a querySelector
for an element with class isgrP
but then you're running a further querySelector
on that element which uses a CSS selector that begins with body
? This doesn't make sense.
I also see from that strange selector that it ends with div.isgrP > ul > div
which, coincidentally, has a div
with the same class name as the one you originally queried with the $eval
method. So did you always intend on finding the element at div.isgrP > ul > div
?
You can access the DOM element directly using puppeteer
by reworking your code as follows:
const scrollBox = await page.$eval('div.isgrP > ul > div.PZuss', (uiElement) => {
return uiElement;
});
This will return your DOM element (not the ElementHandle
instance) for the scrollable box which you have been searching for.
Please let me know if this helps and what is causing your next issue.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.