I have a dynamodb table with nearly 200k items. I need to trigger a lambda for each item in it (send each item to lambda as input). I want to perform this for every x hours for all the items in the table. Data in table changes every 5 days or so.
Is there a way server-less way to automate fetching all the items to lambda via SQS, etc?
I cannot have a lambda to scan the entire table since it is too much for a lambda to handle it (given 300 seconds limit, etc).
Thanks, Vinod.
Both scan and changing all the data in Dynamodb are not feasible.
You can keep all the dynamodb keys in a cache like redis, There can be a separate job who takes keys from redis and puts to sqs, where the lambda is listening. The redis keys can be kept up to date using dynamodb streams.
Dynamo doesn't offer a good way to trigger a lambda for items that already exist, there are a few ways however you could approach this problem:
You mentioned that you where concerned with the lambda not having enough resources to scan all of the items in the table you could try operating on the data in smaller chunks to avoid hitting resource limitations. Lambdas have a max execution time of 15 minutes witch should be enough for most jobs. (Please note that in Lambda CPU scales with Memory so depending on the job over provisioning memory could actually save you money by reducing the time the function takes to complete.)
In ECS using Fargate you can serverlessly create tasks on a cron schedule. If you are worried about resource limits you can provision up to 4 vCPU and 32GB of memory per task, which will make it far less likely you will hit the resource limit. Here is some documentation on how to set that up.
You could can configure your dynamo table to trigger a lambda whenever data in the table is Inserted , Modified , or Removed , you can then process items as they come in or as they change.You can even configure it to batch changes up to 10 items to reduce lambda invocations. Here is a link to the documentation.
Note: This Method doesn't trigger for items already in the table. However you can get around this by writing a script to update a arbitrary field on those items.When you say you want to "trigger" for each item in it it's not 100% clear what you mean. In general I think DynamoDB streams for that, but something has to cause the records to be processed by the stream. That is often done with a simple UpdateItem
to each record, setting a field, that likely isn't part of your data, to something like the current time, or something else unique. From there you will get each record to process through a lambda triggered on the stream.
The 100% serverless way to loop through the data is the following:
I would explore SQS. Have a lambda fetch up to 25 records (the max) in batch, do what it needs and mark the records (such as updating a timestamp on them - use that timestamp ass a filter to ensure that your fetches are only always fetching records that need updating. You can keep fetching records. Eventually the lambda will timeout but since it did not finish you will not have . had a chance to mark your SQS job as complete by deleting task. SQS jobs have a visibility period which when it ends causes them to reappear in the queue thereby causing a lambda to run another batch until eventually the lambda finds no more records and can then remove the SQS job. We use this to refresh elasticsearch indices with all records when our index mapping changes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.