简体   繁体   English

如何处理 Amazon SQS 中的死信队列?

[英]How to handle Dead Letter Queues in Amazon SQS?

I am using event-driven architecture for one of my projects.我正在为我的一个项目使用事件驱动架构。 Amazon Simple Queue Service supports handling failures. Amazon Simple Queue Service 支持处理故障。

If a message was not successfully handled, it does not get to the part where I delete the message from the queue.如果一条消息没有被成功处理,它就不会到达我从队列中删除消息的部分。 If it's a one-time failure, it is handled graciously.如果它是一次性的失败,它会被优雅地处理。 However, if it is an erroneous message, it makes its way into DLQ.但是,如果它是一条错误消息,它就会进入 DLQ。

My question is what should be happening with DLQs later on?我的问题是以后 DLQ 应该怎么办? There are thousands of those messages stuck in the DLQ. DLQ 中有数千条这样的消息。 How are they supposed to be handled?他们应该如何处理?

I would love to hear some real-life examples and engineering processes that are in place in some of the organizations.我很想听听一些组织中的一些现实生活中的例子和工程流程。

"It depends!" “这取决于!”

Messages would have been sent to the Dead Letter Queue because something didn't happen as expected.消息将被发送到死信队列,因为某些事情没有按预期发生。 It might be due to a data problem, a timeout or a coding error.这可能是由于数据问题、超时或编码错误造成的。

You should:你应该:

  • Start examining the messages that went to the Dead Letter Queue开始检查进入死信队列的消息
  • Try and re-process the messages to determine the underlying cause of the failure (but sometimes it is a random failure that you cannot reproduce)尝试并重新处理消息以确定失败的根本原因(但有时它是您无法重现的随机失败)
  • Once a cause is found, update the system to handle that particular use-case, then move onto the next cause找到原因后,更新系统以处理该特定用例,然后转到下一个原因

Common causes can be database locks,.network errors, programming errors and corrupt data.常见原因可能是数据库锁定、网络错误、编程错误和损坏的数据。

It's probably a good idea to setup some sort of monitoring so that somebody investigates more quickly, rather than letting it accumulate to thousands of messages.设置某种监视可能是个好主意,这样有人可以更快地进行调查,而不是让它累积成数千条消息。

The messages moved to DLQ are considered as you said, erroneous.如您所说,移动到 DLQ 的消息被认为是错误的。

If the messages are erroneous due to a bug in the code etc, you should redrive these DLQ messages to source queue once you fixed the bug.如果由于代码中的错误等导致消息错误,您应该在修复错误后将这些 DLQ 消息重新驱动到源队列。 So that they'll have another chance to be reprocessed.以便他们有另一次机会被重新处理。

It is very unlikely that "temporarly" erroneous messages are moved to DLQ, if you already configured the maxReceiveCount as 3 or more for your source queue.如果您已经将源队列的 maxReceiveCount 配置为 3 或更多,则“临时”错误消息不太可能被移动到 DLQ。 Temporary problems are mostly bypassed with this retry configuration.使用此重试配置可以绕过大部分临时问题。

And eventually DLQ is also an ordinary SQS queue which retains messages up to 14 days.最终 DLQ 也是一个普通的 SQS 队列,最多可保留消息 14 天。 Even if there are thousands of messages there, they will be gone.即使那里有成千上万条消息,它们也会消失。 At this point, there are two options:此时,有两种选择:

  • Messages in DLQ are "really" erroneous. DLQ 中的消息“确实”是错误的。 So see the metrics, messages and logs to identify the root cause.因此,请查看指标、消息和日志以确定根本原因。 If there is no bug to fix, it means you keep unrequired data in DLQ.如果没有要修复的错误,则意味着您在 DLQ 中保留了不需要的数据。 So there is nothing wrong to lose them in 14 days.所以在 14 天内丢失它们并没有错。 If there is a bug, fix it an simply redrive messages from DLQ to source queue.如果存在错误,请修复它,只需将消息从 DLQ 重新驱动到源队列。
  • You dont want to investigate through the messages to identify that what was the reason for failure, and you only want to persist message data for historical reasons (god knows why).您不想通过消息进行调查以确定失败的原因是什么,并且您只想出于历史原因保留消息数据(天知道为什么)。 You can create a lambda function to poll messages and persist in a desired target database.您可以创建一个 lambda function 来轮询消息并保存在所需的目标数据库中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Amazon SQS - 如何验证队列是否为死信队列? - Amazon SQS - How to validate if a queue is a dead-letter queue? 处理 AWS SQS 死信队列 - Handle AWS SQS Dead Letter Queue 我应该在 AWS 上有多少个死信队列 - How many dead letter queues should I have on AWS AWS SQS死信队列,如果两条消息具有完全相同的字符串,第二条消息是否会被视为对第一条消息的重试? - AWS SQS dead letter queues, if two messages have exact same string, will the second message be considered a reattempt of the first message? 如何为使用 Terraform 生成的 SQS 队列获取死信队列的 URL? - How to get the URL for a Dead Letter Queue for an SQS queue generated using Terraform? 如何按组处理来自SQS队列的消息 - How to process messages from SQS queues by groups 使用Boto3创建SQS队列时指定死信队列 - Specify a dead letter queue when creating an SQS queue using Boto3 配置 SQS 死信队列以在收到消息时发出云监视警报 - Configure SQS Dead letter Queue to raise a cloud watch alarm on receiving a message google pub sub 是否将消息传递到具有相同消息 ID 的死信队列? - Does google pub sub deliver messages to dead letter queues with the same message id? 最大 AWS SQS 队列 - Max AWS SQS Queues
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM