简体   繁体   中英

dynamodb simple query execution time

I have a python aws lambda function that queries a aws dynamodb. As my api now takes about 1 second to respond to a very simple query/table setup I wanted to understand where i can optimize.

The table has only 3 items (users) at the moment and the following structure:

user_id (Primary Key, String),
details ("[{
      "_nested_atrb1_str": "abc",
      "_nested_atrb2_str": "def",
      "_nested_map": [nested_item1,nested_item2]},
      {..}]

The query is super simple:

response = table.query(
        KeyConditionExpression=Key('userid').eq("xyz")
    )

The query takes 0.8-0.9 seconds.

  • Is this a normal query time for a table with only 3 items where each user only has max 5 attributes(incl nested)?
  • If yes, can i expect similar times if the structure stays the same but the number of items (users) increases hundred-fold ?

There are a few things to investigate. First off, is your timing of .8 - .9 seconds based on timing the query directly by wrapping the query in a time or timeit like timer? If it is the query truly taking that time then there is definitely something not quite right with the interaction to Dynamo from Lambda.

If the time you're seeing is actually from the invoke of your Lambda (I assume this is through API Gateway as a REST API since you mentioned "api") then the time you're seeing could be due to many factors. Can you profile the API call? I would check to see through Postman or even browser tools if you can profile to see the time for DNS lookup, SSL setup, etc. Additionally, CloudWatch will give you metrics specific to the call times for your Lambda once the request has reached Lambda. You could also look at enabling X-Ray which will give you more details in regards to the execution of your Lambda. If your Lambda is running in a VPC you could also be encountering cold starts that are leading to the latency you're seeing.

X-Ray: https://aws.amazon.com/xray/

Cold Starts: just Google "AWS Lambda cold starts" and you'll find all kinds of info

For anyone with similar experiences, I received the below AWS developer support response with some useful references. It didn't solve my problem but I now understand that this is mainly related to the low (test)volume and lambda startup time.

1) Is this a normal query time for a table with only 3 items where each user only has max 5 attributes(incl nested)?

The time is slow but could be due to a number of factors based on your setup. Since you are using Lambda you need to keep in mind that every time you trigger your lambda function it sets up your environment and then executes the code. An AWS Lambda function runs within a container—an execution environment that is isolated from other functions. When you run a function for the first time, AWS Lambda creates a new container and begins executing the function's code. A Lambda function has a handler that is executed once per invocation. After the function executes, AWS Lambda may opt to reuse the container for subsequent invocations of the function. In this case, your function handler might be able to reuse the resources that you defined in your initialization code. (Note that you cannot control how long AWS Lambda will retain the container, or whether the container will be reused at all.) Your table is really small, I had a look at it. [1]

2) Can I expect similar times if the structure stays the same but the number of items (users) increases hundred-fold?

If the code takes longer to execute and you have more data in DynamoDB eventually it could slow down, again based on your set up.

Some of my recommendations on optimizing your set up.

1) Have Lambda and DynamoDB within the same VPC. You can query your DynamoDB via a VPC endpoint. This will cut out any network latencies. [2][3]

2) Increase memory on lambda for faster startup and execution times.

3) As your application scales. Make sure to enable auto-scaling on your DynamoDB table and also increase your RCU and WCU to improve DynamoDBs performance when handling requests. [4]

Additionally, have a look at DynamoDB best practices. [5]

Please feel free to contact me with any additional questions and for further guidance. Thank you. Enjoy your day. Have a great day.

References

  1. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.BestPracticesWithDynamoDB.html
  2. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/vpc-endpoints-dynamodb.html
  3. https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
  4. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AutoScaling.html
  5. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html

Profiling my small lambda code (outside of lambda) I got these results that you may find interesting.

Times in milliseconds

# Initially

3 calls to DB, 
1350 ms 1st call (read)
1074 ms 2nd call (write)
1051 ms 3rd call (read)


# After doing this outside the DB calls and providing it to each one
dynamodb = boto3.resource('dynamodb',region_name=REGION_NAME)

  12   ms executing the line above
1324     ms 1st call (read)  
 285     ms 2nd call (write)
 270     ms 3rd call (read)


# seeing that reusing was producing savings I did the same with
tableusers = dynamodb.Table(TABLE_USERS)

  12 create dynamodb handler
   3 create table handler
1078 read reusing dynamodb and table
 280 write reusing dynamodb and table
 270 read reusing dynamodb (not table)

So initially it took 3.4 seconds, now ~1.6 seconds for just adding 2 lines of code.

I got these results using %lprun on jupyter / Colab

# The -u 0.001 sets the time unit at 1ms (default is 1 microsecond)
%lprun  -u 0.001 -f lambdaquick lambdaquick()  

If you only do 1 DB request and nothing else with the DB, try to put the 2 DB handlers outside the lambda handler as amittn recommends.

Disclaimer: I just learned all this, including deep profiling. So all this may be nonsense.

Note: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. -- Donald Knuth" from https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html

https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/best-practices.html https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStarted.Python.03.html

If you are seeing this issue only on the first invocation then its definitely due to cold start of lambda's. Otherwise on the consequent requests there should be a improvement which might help you to diagnose the actual pain point. Also cloudwatch logs will help in tracking the request.

I am assuming that you are reusing your connections as it cuts several milliseconds off your execution time. If not this will help you achieve that. Any variable outside the lambda_handler function will be frozen in between Lambda invocations and possibly reused. The documentation states to “not assume that AWS Lambda always reuses the container because AWS Lambda may choose not to reuse the container.” but it's observed that depending on the volume of executions, the container is almost always reused.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM