简体   繁体   English

使用 puppeteer 的所有相同类的数据抓取问题

[英]Data scraping problem with all same classes with puppeteer

I'm trying to scrape all of price data from this site https://www.bynogame.com/tr/oyunlar/knight-online/gold-bar using the puppeteer.我正在尝试使用 puppeteer 从该站点https://www.bynogame.com/tr/oyunlar/knight-online/gold-bar抓取所有价格数据。

I can scrap prices one by one, but I can't get all p elements, null data is returned.可以一个一个报废价格,但是不能获取所有p元素,返回null数据。 Here is my code that works to scrap one by one, and the code below to scrap in all datas which does not work.这是我的代码,可以一一废弃,下面的代码可以废弃所有不起作用的数据。 Where I am doing wrong?我在哪里做错了?

const puppeteer = require("puppeteer");
const gb = async () => {

    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://www.bynogame.com/tr/oyunlar/knight-online/gold-bar');

    const data = await page.$eval('body > div.container.mb-5 > div > div:nth-child(1) > div.col-md-18.order-1.order-sm-12 > div > div:nth-child(1) > div > div > div.col-md-21 > div > div.col-md-4 > div > div > div > p', el => el.textContent); //Output is true
    
    await browser.close();   
    console.log(data)
};
gb();

//Here is not work.
const data = await page.$$eval('.col-md-24 mb-2 itemDiv .itemCard .row d-flex align-items-center .col-md-21 .row d-flex align-items-center .col-md-4 .row d-flex flex-column .col .div p', obj => obj.map(p => p.textContent));

There was no element with .col-md-24 mb-2 itemDiv.itemCard.row d-flex align-items-center.col-md-21.row d-flex align-items-center.col-md-4.row d-flex flex-column.col.div p selector in https://www.bynogame.com/tr/oyunlar/knight-online/gold-bar page. .col-md-24 mb-2 itemDiv.itemCard.row d-flex align-items-center.col-md-21.row d-flex align-items-center.col-md-4.row d-flex flex-column.col.div p https://www.bynogame.com/tr/oyunlar/knight-online/gold-bar页面中.col-md-24 mb-2 itemDiv.itemCard.row d-flex align-items-center.col-md-21.row d-flex align-items-center.col-md-4.row d-flex flex-column.col.div p选择器。

You can try it be going to the page, then open up the console and run document.querySelectorAll .您可以尝试将其转到页面,然后打开控制台并运行document.querySelectorAll For example:例如:

document.querySelectorAll('.col-md-24 mb-2 itemDiv .itemCard .row d-flex align-items-center .col-md-21 .row d-flex align-items-center .col-md-4 .row d-flex flex-column .col .div p')

It shows me an empty array, which mean none HTML element matched the given selector.它向我显示了一个空数组,这意味着没有 HTML 元素与给定的选择器匹配。

You do not required such long selector.你不需要这么长的选择器。 When you at the page, you can see that the price was inside <p> tag, with classes of font-weight-bolder text-black m-0 .当您在页面上时,您可以看到价格在<p>标记内,其类别为font-weight-bolder text-black m-0 Firstly, check whether any other element sharing the same selector as the price by querying for <p> tag, with classes of font-weight-bolder text-black m-0 in the console.首先,通过查询<p>标签来检查是否有任何其他元素与价格共享相同的选择器,在控制台中使用font-weight-bolder text-black m-0的类。

document.querySelectorAll('p.font-weight-bolder.text-black.m-0')
--- Output ---
NodeList(9) [ p.font-weight-bolder.text-black.m-0, p.font-weight-bolder.text-black.m-0, p.font-weight-bolder.text-black.m-0, p.font-weight-bolder.text-black.m-0, p.font-weight-bolder.text-black.m-0, p.font-weight-bolder.text-black.m-0, p.font-weight-bolder.text-black.m-0, p.font-weight-bolder.text-black.m-0, p.font-weight-bolder.text-black.m-0 ]

By checking the output, I found that the returned elements was the price I was looking for.通过查看output,我发现返回的元素就是我要找的价格。 Therefore, I can substitute the correct selector into the page.$$eval .因此,我可以将正确的选择器替换为page.$$eval The final code will be最终代码将是

const puppeteer = require("puppeteer");
const gb = async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://www.bynogame.com/tr/oyunlar/knight-online/gold-bar");
  // Get element with <p class="font-weight-bolder text-black m-0">
  const data = await page.$$eval("p.font-weight-bolder.text-black.m-0", (obj) =>
    obj.map((p) => p.textContent)
  );

  await browser.close();
  console.log(data);
};
gb();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM