简体繁体 English

Lambda

[英]SNS > AWS Lambda asyncronous invocation queue vs. SNS > SQS > Lambda

原文 2020-07-03 18:24:34 5 2 amazon-web-services/ aws-lambda/ amazon-sqs/ amazon-sns

Background背景

This archhitecture relies solely on Lambda's asyncronous invocation mechanism as described here:该架构仅依赖于 Lambda 的异步调用机制，如下所述：

https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html

I have a collector function that is invoked once a minute and fetches a batch of data in that can vary drastically in size (tens of of KB to potentially 1-3MB).我有一个收集器 function ，它每分钟调用一次并获取一批数据，这些数据的大小可能会有很大差异（几十 KB 到可能 1-3MB）。 The data contains a JSON array containing one-to-many records.数据包含一个包含一对多记录的 JSON 数组。 The collector function segregates these records and publishes them individually to an SNS topic.收集器 function 分离这些记录并将它们单独发布到 SNS 主题。

A parser function is subribed the SNS topic and has a concurrency limit of 3. SNS asynchronously invokes the parser function per record, meaning that the built-in AWS managed Lambda asyncronous queue begins to fill up as the instances of the parser maxes out at 3. The Lambda queueing mechanism initiates retries at incremental backups when throttling occurs, until the invocation request can be processed by the parser function. A parser function is subribed the SNS topic and has a concurrency limit of 3. SNS asynchronously invokes the parser function per record, meaning that the built-in AWS managed Lambda asyncronous queue begins to fill up as the instances of the parser maxes out at 3 . Lambda 队列机制在发生节流时在增量备份处启动重试，直到解析器 function 可以处理调用请求。

It is imperitive that a record does not get lost during this process as they can not be resurrected.在此过程中，记录不会丢失，因为它们无法复活，这是势在必行的。 I will be using dead letter queues where needed to ensure they ultimately end up somewhere in case of error.我将在需要的地方使用死信队列，以确保它们最终在出现错误的情况下结束。

Testing this method out resulted in no lost invocation.测试此方法不会导致调用丢失。 Everything worked as expected.一切都按预期工作。 Lambda reported hundreds of throttle responses but I'm relying on this to initiate the Lambda retry behaviour for async invocations. Lambda 报告了数百个油门响应，但我依靠它来启动异步调用的 Lambda 重试行为。 My understanding is that this behaivour is effectively the same as that which I'd have to develop and initiate myself if I wanted to retry consuming a message coming from SQS.我的理解是，如果我想重试使用来自 SQS 的消息，这种行为实际上与我必须自己开发和启动的行为相同。

Questions问题

1. Is the built-in AWS managed Lambda asyncronous queue reliable? 1、内置AWS托管的Lambda异步队列可靠吗？

The parser could be subject to a consistent load of 200+ invocations per minute for prelonged periods so I want to understand whether the Lambda queue can handle this as sensibly as an SQS service.解析器可能会在很长一段时间内承受每分钟 200 多次调用的一致负载，因此我想了解 Lambda 队列是否可以像 SQS 服务一样明智地处理这个问题。 The main part that concerns me is this statement:与我有关的主要部分是以下声明：

Even if your function doesn't return an error, it's possible for it to receive the same event from Lambda multiple times because the queue itself is eventually consistent.即使您的 function 没有返回错误，它也有可能多次从 Lambda 接收相同的事件，因为队列本身最终是一致的。 If the function can't keep up with incoming events, events might also be deleted from the queue without being sent to the function.如果 function 无法跟上传入事件，则事件也可能会从队列中删除，而不会发送到 function。 Ensure that your function code gracefully handles duplicate events, and that you have enough concurrency available to handle all invocations.确保您的 function 代码优雅地处理重复事件，并且您有足够的并发性来处理所有调用。

This implies that an incoming invocation may just be deleted out of thin air.这意味着传入的调用可能只是凭空删除。 Also in my implementation I'm relying on the retry behaviour when a function throttles.同样在我的实现中，我依赖于 function 节流时的重试行为。

2. When a message is in the queue, what happens when the message timeout is exceeded? 2、当有消息在队列中时，超过消息超时怎么办？

I can't find a difinitive answer but I'm hoping the message would end up in the configured dead letter queue.我找不到明确的答案，但我希望消息最终会出现在配置的死信队列中。

3. Why would I use SQS over the Lambda queue when SQS presents other problems? 3. 当 SQS 出现其他问题时，为什么我要在 Lambda 队列上使用 SQS？

See the articles below for arguments against SQS.有关针对 SQS 的 arguments，请参阅以下文章。 Overpulling (described in the second link) is of particular concern:过度拉动（在第二个链接中描述）特别值得关注：

https://lumigo.io/blog/sqs-and-lambda-the-missing-guide-on-failure-modes/ https://lumigo.io/blog/sqs-and-lambda-the-missing-guide-on-failure-modes/

https://medium.com/@zaccharles/lambda-concurrency-limits-and-sqs-triggers-dont-mix-well-sometimes-eb23d90122e0 https://medium.com/@zaccharles/lambda-concurrency-limits-and-sqs-triggers-dont-mix-well-sometimes-eb23d90122e0

I can't find any articles or discussions of how the Lambda queue performs.我找不到任何关于 Lambda 队列如何执行的文章或讨论。

Thanks for reading!谢谢阅读！

2 个解决方案

Quite an interesting question.很有趣的问题。 There's a presentation that covered queues in detail.有一个演示文稿详细介绍了队列。 I can't find it at the moment.我暂时找不到。 The premise is the same as this queues are leaky buckets前提和这个队列一样都是漏桶

So what if I add more Leaky Buckets.那么如果我添加更多的漏桶呢。 We'll you've delayed the leaking, however it's now leaking into another bucket.我们会延迟泄漏，但它现在正在泄漏到另一个存储桶中。 Have you solved the problem or delayed it?你解决了问题还是推迟了问题？

What if I vibrate the buckets at different frequencies ?如果我以不同的频率振动水桶会怎样？

Further reading:进一步阅读：

operate lambda 操作 lambda
message expiry 消息过期
message timeout 消息超时
DDIA / DDIA Online DDIA / DDIA 在线
SQS Performance SQS 性能
sqs failure modes sqs 故障模式
mvce is missing from this question so I cannot address the the precise problem you are having.此问题中缺少mvce ，因此我无法解决您遇到的确切问题。
As for an opinion on which to choose for SQS and Lambda queue I'll point to the Meta on this至于为 SQS 和 Lambda 队列选择哪个意见，我将在此指向 Meta
sqs faq mentions Kinesis streams sqs 常见问题解答提到 Kinesis 流
sqs sns kinesis comparison sqs sns 运动比较

TL;DR; TL;博士;

It depends这取决于

I think the biggest advantage of using your own queue is the fact that you as a user have visibility into the state of the your backpressure.我认为使用您自己的队列的最大优势是您作为用户可以看到您的背压的 state。

Using the Lambda async invoke method, you have the potential to get throttled exceptions with the 'guarantee' that lambda will retry over an interval.使用 Lambda 异步调用方法，您有可能获得受限制的异常，并“保证” lambda 将在一段时间内重试。 If using a SQS source queue instead, you have complete visibility into the state of your message processing at all times with no ambiguity.如果改为使用 SQS 源队列，则您可以始终毫无歧义地完全了解消息处理的 state。

Secondly regarding overpulling.二是关于过拉。 In theory this is a concern but in practice its never happened to me.理论上这是一个问题，但实际上它从未发生在我身上。 I've run applications requiring thousands of transactions per second and never once had problems with SQS -> Lambda.我已经运行了每秒需要数千个事务的应用程序，并且从未遇到过 SQS -> Lambda 的问题。 Obviously set your retry policy appropriately and use a DLQ as transient/unpredictable errors CAN occur.显然，适当地设置您的重试策略并使用 DLQ，因为可能会发生瞬态/不可预测的错误。