简体   繁体   English

从Iframe抓取文本

[英]Scrape Text From Iframe

How would I scrape text from an iframe with puppeteer. 如何用木偶操纵者从iframe中抓取文本。

As a simple reproducible example, scrape, This is a paragraph from the iframe of this url 作为一个简单的可重现的例子,scrape, This is a paragraph来自这个url的iframe的This is a paragraph

https://www.w3schools.com/js/tryit.asp?filename=tryjs_events https://www.w3schools.com/js/tryit.asp?filename=tryjs_events

To scrape an iframe 's text in puppeteer, you can use puppeteer's page.evaluate to evaluate JavaScript in the context of the page that returns the iframe 's contents. 要在page.evaluate抓取iframe的文本,您可以使用page.evaluatepage.evaluate来评估返回iframe内容的页面上下文中的JavaScript。

The steps to do so are: 这样做的步骤是:

  1. Grab the iframe Element 抓住iframe元素
  2. Get the iframe 's document object. 获取iframedocument对象。
  3. Use the document object to read the iframe 's HTML 使用document对象来读取iframe的HTML

I wrote this program that grabs This is a paragraph from the link you provided : 我写了这个程序来抓取This is a paragraph 你提供链接中的 This is a paragraph

const puppeteer = require("puppeteer");

(async () => {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();
    await page.goto('https://www.w3schools.com/js/tryit.asp?filename=tryjs_events');

    const iframeParagraph = await page.evaluate(() => {

        const iframe = document.getElementById("iframeResult");

        // grab iframe's document object
        const iframeDoc = iframe.contentDocument || iframe.contentWindow.document;

        const iframeP = iframeDoc.getElementById("demo");

        return iframeP.innerHTML;
    });

    console.log(iframeParagraph); // prints "This is a paragraph"

    await browser.close();

})();

我知道这个问题已经有了答案,但是如果有人想要采用另一种方法,你可以从iframe中获取内容并使用cheerio遍历元素并获取你想要的任何元素的文本 - 你可以找到它在这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM