简体   繁体   中英

Scrape Text From Iframe

How would I scrape text from an iframe with puppeteer.

As a simple reproducible example, scrape, This is a paragraph from the iframe of this url

https://www.w3schools.com/js/tryit.asp?filename=tryjs_events

To scrape an iframe 's text in puppeteer, you can use puppeteer's page.evaluate to evaluate JavaScript in the context of the page that returns the iframe 's contents.

The steps to do so are:

  1. Grab the iframe Element
  2. Get the iframe 's document object.
  3. Use the document object to read the iframe 's HTML

I wrote this program that grabs This is a paragraph from the link you provided :

const puppeteer = require("puppeteer");

(async () => {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();
    await page.goto('https://www.w3schools.com/js/tryit.asp?filename=tryjs_events');

    const iframeParagraph = await page.evaluate(() => {

        const iframe = document.getElementById("iframeResult");

        // grab iframe's document object
        const iframeDoc = iframe.contentDocument || iframe.contentWindow.document;

        const iframeP = iframeDoc.getElementById("demo");

        return iframeP.innerHTML;
    });

    console.log(iframeParagraph); // prints "This is a paragraph"

    await browser.close();

})();

我知道这个问题已经有了答案,但是如果有人想要采用另一种方法,你可以从iframe中获取内容并使用cheerio遍历元素并获取你想要的任何元素的文本 - 你可以找到它在这里

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM