I can get all code of page with Puppeteer, but how I can get only the plain text? without tags?
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://google.com');
console.log(await page.content()); //Get all code
await browser.close();
})();
guys. I've gathered few possible variants in my article: How to get all text from a webpage using Puppeteer?
To keep things short:
innerText
variant. Works with most webpages, but not all of themawait page.$eval('*', el => el.innerText);
await page.$eval('*', (el) => {
const selection = window.getSelection();
const range = document.createRange();
range.selectNode(el);
selection.removeAllRanges();
selection.addRange(range);
return window.getSelection().toString();
});
html-to-text
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.