[英]Want to scrape table using Puppeteer. How can I get all rows, iterate through rows, and then get "td's" for each row?
I have Puppeteer setup, and I was able get all of the rows using:我有 Puppeteer 设置,我可以使用以下方法获取所有行:
let rows = await page.$$eval('#myTable tr', row => row);
Now I want for each row to get " td
's" and then get the innerText
from those.现在我希望每一行都获得“
td
”,然后从中获得innerText
。
Basically I want to do this:基本上我想这样做:
var tds = myRow.querySelectorAll("td");
Where myRow
is a table row, with Puppeteer.其中
myRow
是一个表格行,使用 Puppeteer。
One way to achieve this is to use evaluate that first gets an array of all the TD's
then returns the textContent of each TD
实现此目的的一种方法是使用评估,它首先获取所有
TD's
数组,然后返回每个TD
的 textContent
const puppeteer = require('puppeteer');
const html = `
<html>
<body>
<table>
<tr><td>One</td><td>Two</td></tr>
<tr><td>Three</td><td>Four</td></tr>
</table>
</body>
</html>`;
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(`data:text/html,${html}`);
const data = await page.evaluate(() => {
const tds = Array.from(document.querySelectorAll('table tr td'))
return tds.map(td => td.innerText)
});
//You will now have an array of strings
//[ 'One', 'Two', 'Three', 'Four' ]
console.log(data);
//One
console.log(data[0]);
await browser.close();
})();
You could also use something like:-你也可以使用类似的东西: -
const data = await page.$$eval('table tr td', tds => tds.map((td) => {
return td.innerText;
}));
//[ 'One', 'Two', 'Three', 'Four' ]
console.log(data);
You can also scrape the innerText
into a two-dimensional array representing your table.您还可以将
innerText
一个代表您的表格的二维数组中。
[
['A1', 'B1', 'C1'], // Row 1
['A2', 'B2', 'C2'], // Row 2
['A3', 'B3', 'C3'] // Row 3
]
const result = await page.$$eval('#example-table tr', rows => {
return Array.from(rows, row => {
const columns = row.querySelectorAll('td');
return Array.from(columns, column => column.innerText);
});
});
console.log(result[1][2]); // "C2"
const result = await page.evaluate(() => {
const rows = document.querySelectorAll('#example-table tr');
return Array.from(rows, row => {
const columns = row.querySelectorAll('td');
return Array.from(columns, column => column.innerText);
});
});
console.log(result[1][2]); // "C2"
二维数组单线:
let results = await page.$eval('table tbody', tbody => [...tbody.rows].map(r => [...r.cells].map(c => c.innerText)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.