简体   繁体   中英

AWS Lambda function not scaling - why and what are the options?

I have a python aws-lambda function that takes in an image (geospatial raster) on S3, computes a subset and dumps it to another bucket on S3. The function only runs on demand, there's no schedule.

So basically it has 3 arguments:

  • source
  • destination
  • subset window

Is as trying to invoke the function from my local PC in a loop, going over 1000+ sources:

#...

# create lambda client from botocore session

lambda_client = session.create_client("lambda")

for file in input_files:

    # create payload body with source, destination, window
    # ... 

      response = lambda_client.invoke(
        FunctionName="foobar",
        InvocationType='Event',
        Payload=json.dumps(payload)
    )

    assert response['ResponseMetadata']["HTTPStatusCode"] == 202

The function is set to have a maximum memory of 1024MB and a timeout of 15s , which seems to be just fine.

When I run the invocations, they take quite some time (which is fine) but for some reason, I don't get many concurrent invocations. I haven't set any limits, nor do I see any reason why it would get throttled.

I can see in the metrics dashboard, that I don't get more than 8 concurrent executions:

在此处输入图像描述

A couple of Qs:

  • How can I run this function with a higher concurrency?

  • Is there a better way to implement this kind of function?

Notes:

  • I could easily dockerize the function, so porting to another service wouldn't be a big issue.
  • I don't want to have an S3 trigger
  • I didn't try adding an SQS queue as trigger
  • I don't necessarily want to move to AWS batch, as I'm not familiar with it (yet) and don't want to spend the time now on reading up on it and get it running

Based on the comments.

The Duration metric shows that the average execution time of the lambda function is about 0.5 seconds. Since concurrency is about 8, this means that the for loop in the question makes about 8 requests within this time period.

Since the execution time is so short, a possible solution to improve the time efficiency is to batch the requests , so that multiple payloads are send to the function in a one API call. This not only reduces the number of API calls to AWS, but also extends the execution time of the function.

The alternative is to perform invoke API calls in parallel , rather then one-by-one as it is current done in the for loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM