简体繁体 English

AWS Lambda 无并发地从 SQS 读取

[英]AWS Lambda read from SQS without concurrency

原文 2023-01-02 14:42:43 9 2 amazon-web-services/ aws-lambda/ amazon-sqs

My requirement is like this.我的要求是这样的。

Read from a SQS every 2 hours, take all the messages available and then process it.每 2 小时从 SQS 读取一次，获取所有可用消息，然后对其进行处理。
Processing includes creating a file with details from SQS messages and sending it to an sftp server.处理包括创建一个包含 SQS 消息详细信息的文件，并将其发送到 sftp 服务器。

I implemented a AWS Lambda to achieve point 1. I have a Lambda which has an sqs trigger.我实施了 AWS Lambda 来实现第 1 点。我有一个 Lambda，它有一个 sqs 触发器。 I have set batch size as 50 and then batch window as 2 hours.我已将批处理大小设置为 50，然后将批处理 window 设置为 2 小时。 My assumption was that Lambda will get triggered every 2 hours and 50 messages will be delivered to the lambda function in one go and I will create a file for every 50 records.我的假设是 Lambda 将每 2 小时触发一次，50 条消息将在一个 go 中传递到 lambda function，我将为每 50 条记录创建一个文件。

But I observed that my lambda function is triggered with varied number of messages(sometimes 50 sometimes 20, sometimes 5 etc) even though I have configured batch size as 50.但我观察到我的 lambda function 被不同数量的消息触发（有时 50 有时 20，有时 5 等），即使我将批处理大小配置为 50。
After reading some documentation I got to know(I am not sure) that there are 5 long polling connections which lambda spawns to read from SQS and this is causing this behaviour of lambda function being triggered with varied number of messages.在阅读了一些文档后，我知道（我不确定）有 5 个长轮询连接，其中 lambda 产生以从 SQS 读取，这导致 lambda function 的这种行为被不同数量的消息触发。

My question is我的问题是

Is my assumption on 5 parallel connections being established correct?我对建立 5 个并行连接的假设是否正确？ If yes, is there a way I can control it?如果是，有什么方法可以控制它吗？ I want this to happen in a single thread / connection我希望这在单个线程/连接中发生
If 1 is not possible, what other alternative do I have here.如果 1 不可能，我在这里还有什么其他选择。 I do not want to have one file created for every few records.我不想为每几条记录创建一个文件。 I want one file to be generated every two hours with all the messages in sqs.我希望每两小时生成一个文件，其中包含 sqs 中的所有消息。

2 个解决方案

A "SQS Trigger" for Lambda is implemented with the so-called Event Source Mapping integration, which polls, batches and deletes messages from the queue on your behalf. Lambda 的“SQS 触发器”通过所谓的事件源映射集成实现，它代表您从队列中轮询、批处理和删除消息。 It's designed for continuous polling, although you can disable it.它专为连续轮询而设计，但您可以禁用它。 You can set a maximum batch size of up to 10,000 records a function receives ( BatchSize ) and a maximum of 300s long polling time ( MaximumBatchingWindowInSeconds ).您可以设置最大批量大小为 function 接收的最多 10,000 条记录 ( BatchSize ) 和最长300 秒的长轮询时间 ( MaximumBatchingWindowInSeconds )。 That doesn't meet your once-every-two-hours requirement.这不符合您每两小时一次的要求。

Two alternatives:两种选择：

Remove the Event Source Mapping.删除事件源映射。 Instead, trigger the Lambda every two hours on a schedule with an EventBridge rule.相反，使用 EventBridge 规则每两小时触发一次Lambda。 Your Lambda is responsible for the SQS ReceiveMessage and DeleteMessageBatch operations.您的 Lambda 负责 SQS ReceiveMessage和DeleteMessageBatch操作。 This approach ensures your Lambda will be invoked only once per cron event.这种方法可确保您的 Lambda 每个 cron 事件仅被调用一次。
Keep the Event Source Mapping.保留事件源映射。 Process messages as they arrive, accumulating the partial results in S3.在消息到达时处理消息，在 S3 中累积部分结果。 Once every two hours, run a second, EventBridge-triggered Lambda, which bundles the partial results from S3 and sends them to the SFTP server.每两个小时运行一次，EventBridge 触发 Lambda，它捆绑来自 S3 的部分结果并将它们发送到 SFTP 服务器。 You don't control the number of Lambda invocations.您无法控制 Lambda 调用的次数。

Note on scaling:缩放注意事项：

With the SQS Event Source Mapping integration you can tweak the batch settings, but ultimately the Lambda service is in charge of Lambda scaling.通过 SQS 事件源映射集成，您可以调整批处理设置，但最终 Lambda 服务负责 Lambda 缩放。 As the AWS Blog Understanding how AWS Lambda scales with Amazon SQS standard queues says:正如 AWS 博客了解 AWS Lambda 如何使用 Amazon SQS 标准队列进行扩展所说：

Lambda consumes messages in batches, starting at five concurrent batches with five functions at a time. Lambda 批量消费消息，从五个并发批次开始，一次有五个函数。 If there are more messages in the queue, Lambda adds up to 60 functions per minute, up to 1,000 functions, to consume those messages.如果队列中有更多消息，则 Lambda 每分钟最多添加 60 个函数，最多 1000 个函数来消费这些消息。

You could theoretically restrict the number of concurrent Lambda executions with reserved concurrency , but you would risk dropped messages due to throttling errors .理论上您可以使用预留并发限制并发 Lambda 执行的数量，但您可能会因节流错误而面临丢失消息的风险。

You could try to set the ReservedConcurrency of the function to 1. That may help.您可以尝试将 function 的ReservedConcurrency设置为 1。这可能会有所帮助。 See the docs for reference.请参阅文档以供参考。
A simple solution would be to create a CloudWatch Event Trigger (similar to a Cronjob) that triggers your Lambda function every two hours.一个简单的解决方案是创建一个 CloudWatch 事件触发器（类似于 Cronjob），它每两小时触发一次 Lambda function。 In the Lambda function, you call ReceiveMessage on the Queue until you get all messages, process them and afterward delete them from the Queue.在 Lambda function 中，您在 Queue 上调用ReceiveMessage直到您获得所有消息，处理它们，然后从 Queue 中删除它们。 The drawback is that there may be too many messages to process within 15 minutes so that's something you'd have to manage.缺点是可能有太多消息需要在 15 分钟内处理，因此这是您必须管理的事情。