简体   繁体   English

想用 Puppeteer 刮桌子。 如何获取所有行,遍历行,然后为每一行获取“td”?

[英]Want to scrape table using Puppeteer. How can I get all rows, iterate through rows, and then get "td's" for each row?

I have Puppeteer setup, and I was able get all of the rows using:我有 Puppeteer 设置,我可以使用以下方法获取所有行:

let rows = await page.$$eval('#myTable tr', row => row);

Now I want for each row to get " td 's" and then get the innerText from those.现在我希望每一行都获得“ td ”,然后从中获得innerText

Basically I want to do this:基本上我想这样做:

var tds = myRow.querySelectorAll("td");

Where myRow is a table row, with Puppeteer.其中myRow是一个表格行,使用 Puppeteer。

One way to achieve this is to use evaluate that first gets an array of all the TD's then returns the textContent of each TD实现此目的的一种方法是使用评估,它首先获取所有TD's数组,然后返回每个TD的 textContent

const puppeteer = require('puppeteer');

const html = `
<html>
    <body>
      <table>
      <tr><td>One</td><td>Two</td></tr>
      <tr><td>Three</td><td>Four</td></tr>
      </table>
    </body>
</html>`;

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(`data:text/html,${html}`);

  const data = await page.evaluate(() => {
    const tds = Array.from(document.querySelectorAll('table tr td'))
    return tds.map(td => td.innerText)
  });

  //You will now have an array of strings
  //[ 'One', 'Two', 'Three', 'Four' ]
  console.log(data);
  //One
  console.log(data[0]);
  await browser.close();
})();

You could also use something like:-你也可以使用类似的东西: -

const data = await page.$$eval('table tr td', tds => tds.map((td) => {
  return td.innerText;
}));

//[ 'One', 'Two', 'Three', 'Four' ]
console.log(data);

Two-Dimensional Array Method二维数组法

You can also scrape the innerText into a two-dimensional array representing your table.您还可以将innerText一个代表您的表格的二维数组中。

[
  ['A1', 'B1', 'C1'], // Row 1
  ['A2', 'B2', 'C2'], // Row 2
  ['A3', 'B3', 'C3']  // Row 3
]

page.$$eval() page.$$eval()

const result = await page.$$eval('#example-table tr', rows => {
  return Array.from(rows, row => {
    const columns = row.querySelectorAll('td');
    return Array.from(columns, column => column.innerText);
  });
});

console.log(result[1][2]); // "C2"

page.evaluate() page.evaluate()

const result = await page.evaluate(() => {
  const rows = document.querySelectorAll('#example-table tr');
  return Array.from(rows, row => {
    const columns = row.querySelectorAll('td');
    return Array.from(columns, column => column.innerText);
  });
});

console.log(result[1][2]); // "C2"

二维数组单线:

let results = await page.$eval('table tbody', tbody => [...tbody.rows].map(r => [...r.cells].map(c => c.innerText)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM