I'm attempting scraping for 3 urls with below conditions
Each url need to run in a separate browser.
The url may consist of 2 or more links to click
Open the links in new tab of the respective browsers (paralleled) and switch to it and scrape the content.
In other words, i am trying to open a url in a browser, fetch links in the page, open new tabs based on number of links fetched in the same browser, switch tabs click a button in them and get the confirmation message.
Also I need to run 3 urls parallel.
I have tried CONCURRENCY_BROWSER option to run urls in parallel but I am not able to open the link in a new tab. Any suggestions how I can manipulate tabs in puppeteer-cluster
what i need is :
async function test(){
const cluster = await Cluster.launch({
puppeteerOptions: {
headless: false,
defaultViewport: null,
},
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 5,
skipDuplicateUrls : true,
timeout : 240000,
});
// initiate the cluster task for a set of urls from the cluster queue;
await page.goto(url);
// on visiting the page i retrieve 2 or more links and store it in a array
let linksArray = [...subUrl];
//load suburl in a new tab respectively of the same browser
await cluster.newPage()
//screenshot suburl
await page.screenshot(suburl)
}
TypeError: cluster.newPage is not a function
in puppeteer i used to open a new tab using the command await browser.newPage()
you need to be more specific about your problem and provide some codes ... go ahead step by step and ask when you get to the problem
here is an example of opening multiple tabs on the same browser instance
async function init(){
var browser = await puppeteer.launch({headless: false , args: [ '--no-sandbox', '--disable-setuid-sandbox' , ]});
open_tab('http://example1.com' , browser);
open_tab('http://example2.com' , browser);
open_tab('http://example3.com' , browser);
}
async function open_tab( url , browser ){
let page = await browser.newPage();
await page.setViewport({width: 1200, height: 1000});
await page.goto( url );
}
Author of puppeteer-cluster
here. It is not easily possible to re-use the same browser. But, you can define one task with multiple page.goto
calls inside like this:
const cluster = await Cluster.launch(/* ... */);
// define the task and reuse the window
await cluster.task(async ({ page, data: url }) => {
await page.goto(url);
const secondUrl = /* ... */; // extract another URL somehow
await page.goto(secondUrl);
await page.screenshot(/* ... */);
});
// queue your initial links
cluster.queue('http://...');
cluster.queue('http://...');
// ...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.