简体   繁体   English

如果 Lambda 跟不上,DynamoDB 流项目是否会过期?

[英]Will DynamoDB streams items expire if Lambda can't keep up?

We have configured DynamoDB streams to trigger a Lambda function. More than 10 million unique records will be inserted into DynamoDB table within 30 minutes and Lambda will process these records when triggered through streams.我们配置了 DynamoDB 流来触发 Lambda function。超过 1000 万条唯一记录将在 30 分钟内插入到 DynamoDB 表中,Lambda 将在通过流触发时处理这些记录。

As per DynamoDB Streams documentation, streams will expire after 24 hrs.根据 DynamoDB Streams 文档,流将在 24 小时后过期。

Question:问题:

Does this mean that Lambda function (multiple concurrent executions) should complete processing of all 10 million records within 24hrs?这是否意味着 Lambda function(多次并发执行)应该在 24 小时内完成所有 1000 万条记录的处理?

If some streams events remain to be processed after 24hrs, will they be lost?如果某些流事件在 24 小时后仍未处理,它们会丢失吗?

As long as you don't throttle the lambda, it won't 'not keep up'.只要您不限制 lambda,它就不会“跟不上”。

What will happen is the stream will be batched depending on your settings - so if you have your settings in your dynamo stream to 5 events at once, it will bundle five events and push them toward lambda.将发生的情况是 stream 将根据您的设置进行批处理 - 因此,如果您将 dynamo stream 中的设置同时设置为 5 个事件,它将捆绑五个事件并将它们推向 lambda。

even if that happens hundreds of times a minute, Lambda will (assuming again you aren't purposely limiting lambda executions) spin up additional concurrent executions to handle the load.即使这种情况每分钟发生数百次,Lambda 也会(再次假设您没有故意限制 lambda 执行)启动额外的并发执行来处理负载。

This is standard AWS philosophy.这是标准的 AWS 理念。 Pretty much every serverless resource (and even some not, like EC2 with Elastics Beanstalk) are designed to seamlessly and effortless scale horizontally to handle burst traffic.几乎所有无服务器资源(甚至有些不是,例如 EC2 和 Elastic Beanstalk)都旨在无缝且轻松地水平扩展以处理突发流量。

Likely your Lambda executions will be done within a couple of minutes of the last event being sent.您的 Lambda 执行可能会在发送最后一个事件后的几分钟内完成。 The '24 hour time out' is against waiting for a lambda to be finished/reactivated (ie: you can set up cloudwatch events to 'hold' Dynamo Streams until certain times of the day then process everything, such as waiting until off hours to let all the streams process, then turning it off again during business hours the next day) “24 小时超时”反对等待 lambda 完成/重新激活(即:您可以设置 cloudwatch 事件以“保持”Dynamo Streams 直到一天中的特定时间,然后处理所有内容,例如等到下班时间让所有流处理,然后在第二天的工作时间再次将其关闭)

To give you an example that is similar - I ran 10,000 executions through an SQS into a lambda. It completed the 10,000 executions in about 15 mins.举一个类似的例子——我通过 SQS 将 10,000 次执行执行到 lambda。它在大约 15 分钟内完成了 10,000 次执行。 Lambda concurrency is designed to handle this kind of burst flow. Lambda并发就是为了应对这种突发流量。

Your Dynamo Read/Write capacity is going to be hammered however, so make sure you have it set to at least dynamic and not provisioned.但是,您的 Dynamo 读/写容量将受到重创,因此请确保您至少将其设置为动态且未配置。

UPDATE更新

As @Maurice pointed out in the comments, there is a Stream Limit on concurrent batches sent at a moment with Dynamo.正如@Maurice 在评论中指出的那样,使用 Dynamo 一次发送的并发批次有 Stream 限制。 The calculation indicates that it will fall far short even with a short lambda execution time - longer the lambda, the less likely you are to complete.计算表明,即使 lambda 执行时间较短,它也会远远不够 - lambda 越长,您完成的可能性就越小。

Which means, if you don't have to have those all processed as quickly as posisble you should divide up the input.这意味着,如果您不必尽可能快地处理所有这些,您应该划分输入。

You can add an AWS SQS queue somewhere in the process.您可以在流程中的某处添加 AWS SQS 队列。 Most likely, because even with the largest batch size and and a super quick process you wont get through all them, before the insert into dynamo.最有可能的是,因为即使使用最大的批量大小和超快速的过程,在插入发电机之前,您也无法完成所有这些操作。

The SQS has limits on its messages of up to 14 days. SQS 对其消息有最长 14 天的限制。 This may be enough to do what you want.可能足以做你想做的事。 If you have control of the messages coming in you can insert them into an sqs queue with a wait attached to it in order to process a smaller amount inserts at once - what can be accomplished in a single day, or well slightly less.如果您可以控制传入的消息,则可以将它们插入到带有等待的 sqs 队列中,以便一次处理较小数量的插入 - 这可以在一天内完成,或者稍微少一点。 It would be这将是

lambda to collate your inserts into an SQS queue -> SQS with a wait/smaller batch size -> Lambda to insert smaller batches into dynamo -> Dynamo Stream -> Processing Lambda

The other option is to do something similar but use a State Machine with wait times and maps.另一种选择是做类似的事情,但使用带有等待时间和地图的 State 机器。 State Machines have a 1 year run time limit, so you have plenty of time with that one. State 机器有 1 年的运行时间限制,因此您有足够的时间使用它。

The final option is to, instead of streaming the data straight into lambda, execute lambdas to query smaller sections of the dynamo at once to process them最后的选择是,不是将数据直接流式传输到 lambda,而是执行 lambda 以立即查询发电机的较小部分以处理它们

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM