简体   繁体   中英

node.js async/await or generic-pool causes infinite loop?

I was trying to create an automation script for work, it is supposed to use multiple puppeteer instances to process input strings simultaneously. the task queue and number of puppeteer instances are controlled by the package generic-pool, strangely, when i run the script on ubuntu or debian, it seems that it fells into an infinite loop. tries to run infinite number of puppeteer instances. while when run on windows, the output was normal.

const puppeteer = require('puppeteer');
const genericPool = require('generic-pool');
const faker = require('faker');
let options = require('./options');
let i = 0;
let proxies = [...options.proxy];

const pool = genericPool.createPool({
    create: async () => {
        i++;
        console.log(`create instance ${i}`);
        if (!proxies.length) {
            proxies = [...options.proxy];
        }
        let {control = null, proxy} = proxies.pop();
        let instance = await puppeteer.launch({
            headless: true,
            args: [
                `--proxy-server=${proxy}`,
            ]
        });
        instance._own = {
            proxy,
            tor: control,
            numInstance: i,
        };
        return instance;
    },
    destroy: async instance => {
        console.log('destroy instance', instance._own.numInstance);
        await instance.close()
    },
}, {
    max: 3, 
    min: 1, 
});

async function run(emails = []) {
    console.log('Processing', emails.length);
    const promises = emails.map(email => {
        console.log('Processing', email)
        pool.acquire()
            .then(browser => {
                console.log(`${email} handled`)
                pool.destroy(browser);})
    })
    await Promise.all(promises)
    await pool.drain();
    await pool.clear();
}

let emails = [a,b,c,d,e,];
run(emails)

Output

create instance 1
Processing 10
Processing Stacey_Haley52
Processing Polly.Block
create instance 2
Processing Shanny_Hudson59
Processing Vivianne36
Processing Jayda_Ullrich
Processing Cheyenne_Quitzon
Processing Katheryn20
Processing Jamarcus74
Processing Lenore.Osinski
Processing Hobart75
create instance 3
create instance 4
create instance 5
create instance 6
create instance 7
create instance 8
create instance 9

is it because of my async functions? How can I fix it? Appreciate your help!

Edit 1. modified according to @James suggested

You want to return from your map rather than await , also don't await inside the destroy call, return the result and you can chain these eg

const promises = emails.map(e => pool.acquire().then(pool.destroy));

Or alternatively, you could just get rid of destroy completely eg

pool.acquire().then(b => b.close())

The main problem you are trying to solve,

It is supposed to use multiple puppeteer instances to process input strings simultaneously.

Promise Queue

You can use a rather simple solution that involves a simple promise queue. We can use p-queue package to limit the concurrency as we wish. I used this on multiple scraping projects to always test things out.

Here is how you can use it.

// emails to handle
let emails = [a, b, c, d, e, ];

// create a promise queue
const PQueue = require('p-queue');

// create queue with concurrency, ie: how many instances we want to run at once
const queue = new PQueue({
    concurrency: 1
});

// single task processor
const createInstance = async (email) => {
    let instance = await puppeteer.launch({
        headless: true,
        args: [
            `--proxy-server=${proxy}`,
        ]
    });
    instance._own = {
        proxy,
        tor: control,
        numInstance: i,
    };
    console.log('email:', email)
    return instance;
}

// add tasks to queue
for (let email of emails) {
    queue.add(async () => createInstance(email))
}

Generic Pool Infinite Loop Problem

I removed all kind of puppeteer related code from your sample code and saw how it was still producing the infinite output to console.

create instance 70326
create instance 70327
create instance 70328
create instance 70329
create instance 70330
create instance 70331
...

Now, if you test few times, you will see it will throw the loop only if you something on your code is crashing. The culprit is this pool.acquire() promise, which is just re queuing on error.

To find what is causing the crash, use the following events,

pool.on("factoryCreateError", function(err) {
  console.log('factoryCreateError',err);
});

pool.on("factoryDestroyError", function(err) {
  console.log('factoryDestroyError',err);
});

There are some issues related to this:

  • acquire() never resolves/rejects if factory always rejects, here .
  • About the acquire function in pool.js, here .
  • .acquire() doesn't reject when resource creation fails, here .

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM