简体   繁体   中英

Why does my Python app always cold start twice on AWS lambda?

I have a lambda, in Python where I am loading a large machine learning model during the cold start. The code is something like this:

uuid = uuid4()
app_logger.info("Loading model... %s" % uuid)

endpoints.embedder.load()

def create_app() -> FastAPI:
    app = FastAPI()

    app.include_router(endpoints.router)

    return app

app_logger.info("Creating app... %s" % uuid)
app = create_app()
app_logger.info("Loaded app. %s" % uuid)
handler = Mangum(app)

The first time after deployment, AWS Lambda seems to start the Lambda twice as seen by the two different UUIDs. Here are the logs:

2023-01-05 21:44:40.083 | INFO     | myapp.app:<module>:47 - Loading model... 76a5ac6f-a4fc-490e-b21c-83bb5ef458eb
2023-01-05 21:44:42.406 | INFO     | myapp.embedder:load:31 - Loading embedding model
2023-01-05 21:44:50.626 | INFO     | myapp.app:<module>:47 - Loading model... c633a9c6-bcfc-44d5-bacf-9834b39ee300
2023-01-05 21:44:51.878 | INFO     | myapp.embedder:load:31 - Loading embedding model
2023-01-05 21:45:00.418 | INFO     | myapp.app:<module>:59 - Creating app... c633a9c6-bcfc-44d5-bacf-9834b39ee300
2023-01-05 21:45:00.420 | INFO     | myapp.app:<module>:61 - Loaded app. c633a9c6-bcfc-44d5-bacf-9834b39ee300

This happens consistently. It executes it for 10 seconds the first time, then seems to restart and do it again. There are no errors in the logs that indicate why this would be. I have my Lambda configured to run with 4G of memory and it always loads with < 3GB used.

Any ideas why this happens and how to avoid it?

To summarize all the learnings in the comments so far:

  • AWS limits the init phase to 10 seconds. This is explained here: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html
  • If the app exceeds 10 seconds, it gets inited again without this limit
  • If you hit the 10 second limit, there are two ways to deal with this:
    • Init the model after the function is loaded during the invocation. The downsides being that you don't get the CPU boost and lower cost initialization.
    • Use provisioned concurrency. Init is not limited to 10 seconds, but this is more expensive and can still run into the same problems as not using it, eg if you get a burst in usage.
  • Moving my model to EFS does improve startup time compared to S3 and Docker layer caching, but it is not sufficient to make it init in < 10 seconds. It might work for other use cases with slightly smaller models though.

Perhaps someday SnapStart will address this problem for Python. Until then, I am going back to EC2.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM