简体   繁体   English

从新打开的页面木偶操作者获取标题

[英]Get title from newly opened page puppeteer

I am trying to get the a new tab and scrape the title of that page with puppeteer . 我正试图获得一个新标签,并用puppeteer刮掉该页面的标题。

This is what I have 这就是我所拥有的

// use puppeteer
const puppeteer = require('puppeteer');

//set wait length in ms: 1000ms = 1sec
const short_wait_ms = 1000

async function run() {
    const browser = await puppeteer.launch({
        headless: false, timeout: 0});
    const page = await browser.newPage();

        await page.goto('https://biologyforfun.wordpress.com/2017/04/03/interpreting-random-effects-in-linear-mixed-effect-models/');

    // second page DOM elements
    const CLICKHERE_SELECTOR = '#post-2068 > div > div.entry-content > p:nth-child(2) > a:nth-child(1)';

    // main page
    await page.waitFor(short_wait_ms);
    await page.click(CLICKHERE_SELECTOR);


    // new tab opens - move to new tab
    let pages = await browser.pages();

    //go to the newly opened page

    //console.log title -- Generalized Linear Mixed Models in Ecology and in R

}

run();

I can't figure out how to use browser.page() to start working on the new page. 我无法弄清楚如何使用browser.page()开始处理新页面。

According to the Puppeteer Documentation : 根据Puppeteer文档

page.title() 页面标题()

Shortcut for page.mainFrame().title() . page.mainFrame().title()快捷方式。

Therefore, you should use page.title() for getting the title of the newly opened page. 因此,您应该使用page.title()来获取新打开页面的标题。

Alternatively, you can gain a slight performance boost by using the following: 或者,您可以通过使用以下内容获得轻微的性能提升:

page._frameManager._mainFrame.evaluate(() => document.title)

Note: Make sure to use the await operator when calling page.title() , as the title tag must be downloaded before Puppeteer can access its content. 注意:确保在调用page.title()时使用await运算符,因为必须先下载标题标记,然后Puppeteer才能访问其内容。

You shouldn't need to move to the new tab. 您不需要移动到新选项卡。

To get the title of any page you can use: 要获取您可以使用的任何页面的标题:

const pageTitle = await page.title();

Also after you click something and you're waiting for the new page to load you should wait for the load event or the network to be Idle: 此外,在您单击某些内容并等待加载新页面后,您应该等待加载事件或网络处于空闲状态:

// Wait for redirection
await page.waitForNavigation({waitUntil: 'networkidle', networkIdleTimeout: 1000});

Check the docs: https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagewaitfornavigationoptions 查看文档: https//github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagewaitfornavigationoptions

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM