[英]Web Scrape with Puppeteer within a table
I am trying to scrape this page.我正在尝试抓取此页面。
https://www.psacard.com/Pop/GetItemTable?headingID=172510&categoryID=20019&isPSADNA=false&pf=0&_=1583525404214 https://www.psacard.com/Pop/GetItemTable?headingID=172510&categoryID=20019&isPSADNA=false&pf=0&_=1583525404214
I want to be able to find the grade count for PSA 9 and 10. If we look at the HTML of the page, you will notice that PSA does a very bad job (IMO) at displaying the data.我希望能够找到 PSA 9 和 10 的成绩计数。如果我们查看页面的 HTML,您会注意到 PSA 在显示数据方面做得非常糟糕 (IMO)。 Every TR is a player.
每个TR都是一个玩家。 And the first TD is a card number.
第一个TD是卡号。 Let's just say I want to get Card Number 1 which in this case is Kevin Garnett.
假设我想获得 1 号卡,在这种情况下是凯文加内特。
There are a total of four cards, so those are the only four cards I want to display.总共有四张牌,所以我只想展示这四张牌。
Here is the code I have.这是我的代码。
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://www.psacard.com/Pop/GetItemTable?headingID=172510&categoryID=20019&isPSADNA=false&pf=0&_=1583525404214");
const tr = await page.evaluate(() => {
const tds = Array.from(document.querySelectorAll('table tr'))
return tds.map(td => td.innerHTML)
});
const getName = tr.map(name => {
//const thename = Array.from(name.querySelectorAll('td.card-num'))
console.log("\n\n"+name+"\n\n");
})
await browser.close();
})();
I will get each TR printed, but I can't seem to dive into those TRs.我会打印每个 TR,但我似乎无法深入研究这些 TR。 You can see I have a line commented out, I tried to do this but get an error.
您可以看到我注释掉了一行,我尝试执行此操作但出现错误。 As of right now, I am not getting it by the player dynamically... The easiest way I would think is to create a function that would think about getting the specific card would be doing something where the select the
TR -> TD.card-num == 1
for Kevin.截至目前,我没有通过玩家动态获取它......我认为最简单的方法是创建一个函数来考虑获取特定卡片将在选择
TR -> TD.card-num == 1
地方做一些事情凯文的TR -> TD.card-num == 1
。
Any help with this would be amazing.对此的任何帮助将是惊人的。
Thanks谢谢
Short answer: You can just copy and paste that into Excel and it pastes perfectly.简短回答:您可以将其复制并粘贴到 Excel 中,然后完美粘贴。
Long answer: If I'm understanding this correctly, you'll need to map over all of the td elements and then, within each td, map each tr.长答案:如果我理解正确,您需要映射所有 td 元素,然后在每个 td 内映射每个 tr。 I use cheerio as a helper.
我使用cheerio作为帮手。 To complete it with puppeteer just do:
html = await page.content()
and then pass html into the cleaner I've written below:要使用
html = await page.content()
完成它,只需执行: html = await page.content()
然后将 html 传递到我在下面写的清洁器中:
const cheerio = require("cheerio")
const fs = require("fs");
const test = (html) => {
// const data = fs.readFileSync("./test.html");
// const html = data.toString();
const $ = cheerio.load(html);
const array = $("tr").map((index, element)=> {
const card_num = $(element).find(".card-num").text().trim()
const player = $(element).find("strong").text()
const mini_array = $(element).find("td").map((ind, elem)=> {
const hello = $(elem).find("span").text().trim()
return hello
})
return {
card_num,
player,
column_nine: mini_array[13],
column_ten: mini_array[14],
total:mini_array[15]
}
})
console.log(array[2])
}
test()
The code above will output the following:上面的代码将输出以下内容:
{
card_num: '1',
player: 'Kevin Garnett',
column_nine: '1-0',
column_ten: '0--',
total: '100'
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.