简体   繁体   中英

Get all plain text with Puppeteer

I can get all code of page with Puppeteer, but how I can get only the plain text? without tags?

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://google.com');
  console.log(await page.content()); //Get all code
  await browser.close();
})();

我没有尝试过,但是$eval可能对您$eval

await page.$eval('*', el => el.innerText);

guys. I've gathered few possible variants in my article: How to get all text from a webpage using Puppeteer?

To keep things short:

  1. innerText variant. Works with most webpages, but not all of them
await page.$eval('*', el => el.innerText);
  1. Select text variant. Works with more webpages
await page.$eval('*', (el) => {
        const selection = window.getSelection();
        const range = document.createRange();
        range.selectNode(el);
        selection.removeAllRanges();
        selection.addRange(range);
        return window.getSelection().toString();
    });
  1. Use a third-party library of your choice (like html-to-text )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM