简体   繁体   中英

How to handle multiple tabs in puppeteer-cluster[CONCURRENCY_BROWSER]?

I'm attempting scraping for 3 urls with below conditions

  1. Each url need to run in a separate browser.

  2. The url may consist of 2 or more links to click

  3. Open the links in new tab of the respective browsers (paralleled) and switch to it and scrape the content.

In other words, i am trying to open a url in a browser, fetch links in the page, open new tabs based on number of links fetched in the same browser, switch tabs click a button in them and get the confirmation message.

Also I need to run 3 urls parallel.

I have tried CONCURRENCY_BROWSER option to run urls in parallel but I am not able to open the link in a new tab. Any suggestions how I can manipulate tabs in puppeteer-cluster

what i need is :

async function test(){
    const cluster = await Cluster.launch({
        puppeteerOptions: {
            headless: false,
            defaultViewport: null, 
        },
      
        concurrency: Cluster.CONCURRENCY_BROWSER,
        maxConcurrency: 5,
        skipDuplicateUrls : true,
        timeout : 240000,
    });

    // initiate the cluster task for a set of urls from the cluster queue;
    
    await page.goto(url);
    
    // on visiting the page i retrieve 2 or more links and store it in a array
    
    let linksArray = [...subUrl];
    
    //load suburl in a new tab respectively of the same browser

    await cluster.newPage()

    //screenshot suburl
    
    await page.screenshot(suburl)
        
}

TypeError: cluster.newPage is not a function

in puppeteer i used to open a new tab using the command await browser.newPage()

you need to be more specific about your problem and provide some codes ... go ahead step by step and ask when you get to the problem

here is an example of opening multiple tabs on the same browser instance

async function init(){

    var  browser = await puppeteer.launch({headless: false        ,  args: [ '--no-sandbox', '--disable-setuid-sandbox' , ]});
    open_tab('http://example1.com' , browser);
    open_tab('http://example2.com' , browser);
    open_tab('http://example3.com' , browser);

}


async function open_tab( url , browser ){


    let  page  = await browser.newPage();
    await page.setViewport({width: 1200, height: 1000});
    await page.goto( url );

}

Author of puppeteer-cluster here. It is not easily possible to re-use the same browser. But, you can define one task with multiple page.goto calls inside like this:

const cluster = await Cluster.launch(/* ... */);

// define the task and reuse the window 
await cluster.task(async ({ page, data: url }) => {
    await page.goto(url);
    const secondUrl = /* ... */; // extract another URL somehow
    await page.goto(secondUrl);
    await page.screenshot(/* ... */);
});

// queue your initial links
cluster.queue('http://...');
cluster.queue('http://...');
// ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM