简体   繁体   中英

How to get specific text inside div in puppeteer

I'm trying to capture the username of each user on the page.I've tried about 5 different CSS Selector inputs for the itemArea variable. I think I'm just not experienced enough with css or html....

If anyone knows how to grab that info, and could link a resource for specific navigation on html or css for this usage that would be very helpful.

New to javascript, and this program is infinitely easier in python, but i'd like to see this project through.

const puppeteer = require('puppeteer');
const url = "https://poshmark.com/category/Men-Jackets_&_Coats?sort_by=like_count&all_size=true&my_size=false";
let usernames = [];

async function main() {
 

    const client = await puppeteer.launch({
        headless: true,
        executablePath: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
    });

    const page = await client.newPage();

    await page.goto(url);

    await page.waitForSelector(".tc--g.m--l--1.ellipses");

   

    const itemArea = await page.evaluate(() => {

        Array.from(document.querySelectorAll('.tc--g m--l--1 ellipses' )).map(x => x.textContent());
        
        console.log(itemArea);
    });

    itemArea.each(function (i, element) {
        //console.log('username: ', $(element).text());
        usernames.push($(element).text());
        
        console.log(usernames);
    });
};

main();

Tried many different css inputs for a few hours, my main errors i get are:

Reference Error, itemArea is not defined CSS Selector is incorrect Evaluation failed: TypeError: x.textContent is not a function

FYI, the first text im trying to grab on that page is ishhbang. That is his username. After that my plan is to write a loop to get all usernames.

I got this to work in Cheerio using this code:

        let res = await axios.get(url);
        let $ = await cheerio.load(res.data);

        const itemArea = $(".tiles_container a.tile__creator span");

In puppeteer that returns Evaluation failed: TypeError: x.textContent is not a function

Thanks a lot to anyone who helps.

I see four issues:

  1. The error is correct: .textContent is a property, not a function, so remove the () s.
  2. Your selector is incorrect: .tc--g m--l--1 ellipses should be .tc--gm--l--1.ellipses (you had it right in your waitForSelector ).
  3. You need to return from your evaluate block, otherwise it returns undefined by default.
  4. The Cheerio code involving .each isn't necessary and won't work with a plain array of strings.

The first two issues can be debugged in devtools and aren't specific to Puppeteer. The third issue is basically Get elements from page.evaluate in Puppeteer? .

Now, if Cheerio and Axios work for you already, given that the data is baked into the static HTML, just use that. It's much faster and less complex than Puppeteer, which is designed for interacting with JS-driven pages. But if you are going to use Puppeteer, drop Cheerio to avoid confusion.

If you're curious, here's Puppeteer code, with some optimizations to block unnecessary resources that add verbosity but give about a 2.5x speedup on my machine:

const puppeteer = require("puppeteer"); // ^19.1.0

const url = "your URL";

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setJavaScriptEnabled(false);
  await page.setRequestInterception(true);
  page.on("request", req => {
    if (req.url() !== url) {
      req.abort();
    }
    else {
      req.continue();
    }
  });
  await page.goto(url, {waitUntil: "domcontentloaded"});
  const users = await page.$$eval(
    ".tc--g.m--l--1.ellipses",
    els => els.map(e => e.textContent)
  );
  console.log(users);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

Another pointer: Cheerio is totally synchronous, so let $ = await cheerio.load(res.data); doesn't need an await . Here's how I'd write the Cheerio version of the above script:

const cheerio = require("cheerio"); // 1.0.0-rc.12

const url = "your URL";

(async () => {
  const response = await fetch(url); // native in Node 18+

  if (!response.ok) {
    throw Error(response.statusText);
  }

  const $ = cheerio.load(await response.text());
  const users = [...$(".tc--g.m--l--1.ellipses")].map(e => $(e).text());
  console.log(users);
})()
  .catch(error => console.log(error));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM