简体   繁体   中英

Web Scrape with Puppeteer within a table

I am trying to scrape this page.

https://www.psacard.com/Pop/GetItemTable?headingID=172510&categoryID=20019&isPSADNA=false&pf=0&_=1583525404214

I want to be able to find the grade count for PSA 9 and 10. If we look at the HTML of the page, you will notice that PSA does a very bad job (IMO) at displaying the data. Every TR is a player. And the first TD is a card number. Let's just say I want to get Card Number 1 which in this case is Kevin Garnett.

There are a total of four cards, so those are the only four cards I want to display.

Here is the code I have.

const puppeteer = require('puppeteer');


(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://www.psacard.com/Pop/GetItemTable?headingID=172510&categoryID=20019&isPSADNA=false&pf=0&_=1583525404214");

  const tr = await page.evaluate(() => {
    const tds = Array.from(document.querySelectorAll('table tr'))
    return tds.map(td => td.innerHTML)
  });


    const getName = tr.map(name => {
        //const thename = Array.from(name.querySelectorAll('td.card-num'))
        console.log("\n\n"+name+"\n\n");
    })


  await browser.close();
})();

I will get each TR printed, but I can't seem to dive into those TRs. You can see I have a line commented out, I tried to do this but get an error. As of right now, I am not getting it by the player dynamically... The easiest way I would think is to create a function that would think about getting the specific card would be doing something where the select the TR -> TD.card-num == 1 for Kevin.

Any help with this would be amazing.

Thanks

Short answer: You can just copy and paste that into Excel and it pastes perfectly.

Long answer: If I'm understanding this correctly, you'll need to map over all of the td elements and then, within each td, map each tr. I use cheerio as a helper. To complete it with puppeteer just do: html = await page.content() and then pass html into the cleaner I've written below:

 const cheerio = require("cheerio")
 const fs = require("fs");

const test  = (html) => {
//   const data = fs.readFileSync("./test.html");
//   const html = data.toString();
  const $ = cheerio.load(html);
  const array = $("tr").map((index, element)=> {
      const card_num = $(element).find(".card-num").text().trim()
      const player = $(element).find("strong").text()
      const mini_array = $(element).find("td").map((ind, elem)=> {
          const hello = $(elem).find("span").text().trim()
          return hello
      })
      return {
          card_num,
          player,
          column_nine: mini_array[13],
          column_ten: mini_array[14],
          total:mini_array[15]
      }
  })
  console.log(array[2])
}

test()

The code above will output the following:

{
  card_num: '1',
  player: 'Kevin Garnett',
  column_nine: '1-0',
  column_ten: '0--',
  total: '100'
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM