[英]Scrape Text From Iframe
How would I scrape text from an iframe with puppeteer. 如何用木偶操纵者从iframe中抓取文本。
As a simple reproducible example, scrape, This is a paragraph
from the iframe of this url 作为一个简单的可重现的例子,scrape,
This is a paragraph
来自这个url的iframe的This is a paragraph
https://www.w3schools.com/js/tryit.asp?filename=tryjs_events https://www.w3schools.com/js/tryit.asp?filename=tryjs_events
To scrape an iframe
's text in puppeteer, you can use puppeteer's page.evaluate
to evaluate JavaScript in the context of the page that returns the iframe
's contents. 要在
page.evaluate
抓取iframe
的文本,您可以使用page.evaluate
的page.evaluate
来评估返回iframe
内容的页面上下文中的JavaScript。
The steps to do so are: 这样做的步骤是:
iframe
Element iframe
元素 iframe
's document
object. iframe
的document
对象。 document
object to read the iframe
's HTML document
对象来读取iframe
的HTML I wrote this program that grabs This is a paragraph
from the link you provided : 我写了这个程序来抓取
This is a paragraph
你提供的链接中的 This is a paragraph
:
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.w3schools.com/js/tryit.asp?filename=tryjs_events');
const iframeParagraph = await page.evaluate(() => {
const iframe = document.getElementById("iframeResult");
// grab iframe's document object
const iframeDoc = iframe.contentDocument || iframe.contentWindow.document;
const iframeP = iframeDoc.getElementById("demo");
return iframeP.innerHTML;
});
console.log(iframeParagraph); // prints "This is a paragraph"
await browser.close();
})();
我知道这个问题已经有了答案,但是如果有人想要采用另一种方法,你可以从iframe中获取内容并使用cheerio遍历元素并获取你想要的任何元素的文本 - 你可以找到它在这里 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.