简体繁体中英

Lambda with DynamoDB Trigger on a table Partition Key with more than 500000 distinct values

原文 2019-06-19 02:23:56 5 2 aws-lambda/ amazon-dynamodb/ amazon-dynamodb-streams/ amazon-dynamodb-data-modeling

We are currently designing a dynamodb table to store certain file attributes. There are 2 main columns

Date:- This contains the date in YYMMDD format for ex:-20190618
FileName:- xxxxxxxxxxx.json

Currently the partition key is Date and sort key is FileName. We expect about 500000 files with distinct file names on each day (this can increase over period of time) . The File names will repeated same each day ie a typical schema is as shown below

Date FileName 20190617 abcd.json 20190618 abcd.json

We have a series of queries that is based on Date and a dynamodb trigger. The queries are working great. Currently what we are observing is that the number of concurrent lambda executions are limited to 2, since we are partition by date. While trying to improve the concurrency of lambda we came across 2 solutions

1) Referring the following link ( https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-sharding.html ) , one idea is add a fixed number of random suffixes for Date Field ie (20190617.1 to 20190617.500) to split the data in to 500 partitions with 1000 records each . This would ensure an amount of concurrency and also there will be minimal changed to query

2) Second option is to change partitioning of table as follows Partition Key :- FileName and SortKey :- Date. This will result in about 500000 partitions , (which can increase) . For querying by date we will need to add a GSI, but we will achieve more concurrency in Lambda

we have not created a table with 500000 partitions (which can increase). Any body has such experience... If so please comment

Any help is appreciated

2 answers

You seem to be under the mistaken impression that there's a one to one correspondence between partition keys and partitions.

This is not the case.

The number of partitions is driven by table size and through-put. The partition key is hashed by DDB and the data is stored in a particular partition.

You could have 100k partition keys and only a single partition.

If you're pushing the limits of DDB, then yeah you might end up with only a single partition key in a partition...but that's not typical.

The DDB Whitepaper provides some details into how DDB works...

Partitioning by file name doesn't make a lot of sense if your access pattern is to query by date.

Instead, the idea of increasing the number of partitions for each date by adding a suffix seems fine. But rather than adding a random suffix, you might consider adding a stable suffix based on the name of the file:

You could use the first letter of the file name, to get about 30 partitions - assuming the file names are random. The only trouble is some letter might be more common than others giving skewed subpartitions

Or, you could take a hash of the file name and use that as the suffix for the partition key. The hash function could be a relatively simple hash function that produces a target numeric value corresponding to the number of sub partitions you would like to have for each date.

If you end up with about 10000-50000 items per partition it would probably be great.

Hope this helps

how to group AWS dynamodb table and get latest value of partition key using boto3(lambda)?

Trigger lambda on dynamodb table to send email

Is it possible to use more than one dynamodb table

Batching trigger Lambda on DynamoDB

DynamoDB trigger Lambda Function

Batch write more than 25 items on DynamoDB using Lambda

How to trigger lambda for each item in a huge static dynamodb table

Identify trigger or find dynamodb table name from lambda function code

How to have more than two DynamoDB Streams that trigger lambdas

How to implement a Lambda trigger to fire once on a global dynamoDb table

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question how to group AWS dynamodb table and get latest value of partition key using boto3(lambda)? Trigger lambda on dynamodb table to send email Is it possible to use more than one dynamodb table Batching trigger Lambda on DynamoDB DynamoDB trigger Lambda Function Batch write more than 25 items on DynamoDB using Lambda How to trigger lambda for each item in a huge static dynamodb table Identify trigger or find dynamodb table name from lambda function code How to have more than two DynamoDB Streams that trigger lambdas How to implement a Lambda trigger to fire once on a global dynamoDb table

Related Tags

Lambda with DynamoDB Trigger on a table Partition Key with more than 500000 distinct values

Question

2 answers

solution1
2 ACCPTED 2019-06-19 13:49:08

solution2
1 2019-06-19 04:44:13

Lambda with DynamoDB Trigger on a table Partition Key with more than 500000 distinct values

Question

2 answers

solution1 2 ACCPTED 2019-06-19 13:49:08

solution2 1 2019-06-19 04:44:13

solution1
2 ACCPTED 2019-06-19 13:49:08

solution2
1 2019-06-19 04:44:13