How can the AWS Lambda concurrent execution limit be reached?

Question

UPDATE

The original test code below is largely correct, but in NodeJS the various AWS services should be setup a bit differently as per the SDK link provided by @Michael-sqlbot

// manager
const AWS = require("aws-sdk")
const https = require('https');
const agent = new https.Agent({
    maxSockets: 498 // workers hit this level; expect plus 1 for the manager instance
});
const lambda = new AWS.Lambda({
    apiVersion: '2015-03-31',
    region: 'us-east-2', // Initial concurrency burst limit = 500
    httpOptions: {   // <--- replace the default of 50 (https) by
        agent: agent // <--- plugging the modified Agent into the service
    }
})
// NOW begin the manager handler code

In planning for a new service, I am doing some preliminary stress testing. After reading about the 1,000 concurrent execution limit per account and the initial burst rate (which in us-east-2 is 500), I was expecting to achieve at least the 500 burst concurrent executions right away. The screenshot below of CloudWatch's Lambda metric shows otherwise. I cannot get past 51 concurrent executions no matter what mix of parameters I try . Here's the test code:

// worker
exports.handler = async (event) => {
    // declare sleep promise
    const sleep = (ms) => new Promise((resolve) => setTimeout(resolve, ms));

    // return after one second
    let nStart = new Date().getTime()
    await sleep(1000)
    return new Date().getTime() - nStart; // report the exact ms the sleep actually took
};

// manager
exports.handler = async(event) => {
    const invokeWorker = async() => {
        try {
            let lambda = new AWS.Lambda() // NO! DO NOT DO THIS, SEE UPDATE ABOVE
            var params = {
                FunctionName: "worker-function",
                InvocationType: "RequestResponse",
                LogType: "None"
            };
            return await lambda.invoke(params).promise()

        }
        catch (error) {
            console.log(error)
        }
    };

    try {
        let nStart = new Date().getTime()
        let aPromises = []

        // invoke workers
        for (var i = 1; i <= 3000; i++) {
            aPromises.push(invokeWorker())
        }

        // record time to complete spawning
        let nSpawnMs = new Date().getTime() - nStart

        // wait for the workers to ALL return
        let aResponses = await Promise.all(aPromises)

        // sum all the actual sleep times
        const reducer = (accumulator, response) => { return accumulator + parseInt(response.Payload) };
        let nTotalWorkMs = aResponses.reduce(reducer, 0)

        // show me
        let nTotalET = new Date().getTime() - nStart
        return {
            jobsCount: aResponses.length,
            spawnCompletionMs: nSpawnMs,
            spawnCompletionPct: `${Math.floor(nSpawnMs / nTotalET * 10000) / 100}%`,
            totalElapsedMs: nTotalET,
            totalWorkMs: nTotalWorkMs,
            parallelRatio: Math.floor(nTotalET / nTotalWorkMs * 1000) / 1000
        }
    }

    catch (error) {
        console.log(error)
    }
};

Response:
{
  "jobsCount": 3000,
  "spawnCompletionMs": 1879,
  "spawnCompletionPct": "2.91%",
  "totalElapsedMs": 64546,
  "totalWorkMs": 3004205,
  "parallelRatio": 0.021
}

Request ID:
"43f31584-238e-4af9-9c5d-95ccab22ae84"

Am I hitting a different limit that I have not mentioned? Is there a flaw in my test code? I was attempting to hit the limit here with 3,000 workers, but there was NO throttling encountered, which I guess is due to the Asynchronous invocation retry behaviour.

Edit : There is no VPC involved on either Lambda; the setting in the select input is "No VPC".

Edit : Showing Cloudwatch before and after the fix

Answer 1

There were a number of potential suspects, particularly due to the fact that you were invoking Lambda from Lambda, but your focus on consistently seeing a concurrency of 50 — a seemingly arbitrary limit (and a suspiciously round number) — reminded me that there's an anti-footgun lurking in the JavaScript SDK:

In Node.js, you can set the maximum number of connections per origin. If maxSockets is set, the low-level HTTP client queues requests and assigns them to sockets as they become available.

Here of course, "origin" means any unique combination of scheme + hostname, which in this case is the service endpoint for Lambda in us-east-2 that the SDK is connecting to in order to call the Invoke method, https://lambda.us-east-2.amazonaws.com .

This lets you set an upper bound on the number of concurrent requests to a given origin at a time. Lowering this value can reduce the number of throttling or timeout errors received. However, it can also increase memory usage because requests are queued until a socket becomes available.

...

When using the default of https , the SDK takes the maxSockets value from the globalAgent . If the maxSockets value is not defined or is Infinity, the SDK assumes a maxSockets value of 50.

https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-configuring-maxsockets.html

Answer 2

Lambda concurrency it not the only factor that decides how scalable your functions are. If your Lambda function is runnning within a VPC, it will require an ENI (Elastic Network Interface) which allows for ethernet traffic from and to the container (Lambda function).

It's possible your throttling occurred due to too many ENI's being requested (50 at a time). You can check this by viewing the logs of the Manager lambda function and looking for an error message when it's trying to invoke one of the child containers. If the error looks something like the following, you'll know ENI's is your issue.

Lambda was not able to create an ENI in the VPC of the Lambda function because the limit for Network Interfaces has been reached.

How can the AWS Lambda concurrent execution limit be reached?

Question

2 answers

solution1
2 ACCPTED 2019-02-12 15:21:43

solution2
1 2019-02-11 15:20:24

How can the AWS Lambda concurrent execution limit be reached?

Question

2 answers

solution1 2 ACCPTED 2019-02-12 15:21:43

solution2 1 2019-02-11 15:20:24

solution1
2 ACCPTED 2019-02-12 15:21:43

solution2
1 2019-02-11 15:20:24