[英]How to handle Dead Letter Queues in Amazon SQS?
I am using event-driven architecture for one of my projects.我正在为我的一个项目使用事件驱动架构。 Amazon Simple Queue Service supports handling failures.
Amazon Simple Queue Service 支持处理故障。
If a message was not successfully handled, it does not get to the part where I delete the message from the queue.如果一条消息没有被成功处理,它就不会到达我从队列中删除消息的部分。 If it's a one-time failure, it is handled graciously.
如果它是一次性的失败,它会被优雅地处理。 However, if it is an erroneous message, it makes its way into DLQ.
但是,如果它是一条错误消息,它就会进入 DLQ。
My question is what should be happening with DLQs later on?我的问题是以后 DLQ 应该怎么办? There are thousands of those messages stuck in the DLQ.
DLQ 中有数千条这样的消息。 How are they supposed to be handled?
他们应该如何处理?
I would love to hear some real-life examples and engineering processes that are in place in some of the organizations.我很想听听一些组织中的一些现实生活中的例子和工程流程。
"It depends!" “这取决于!”
Messages would have been sent to the Dead Letter Queue because something didn't happen as expected.消息将被发送到死信队列,因为某些事情没有按预期发生。 It might be due to a data problem, a timeout or a coding error.
这可能是由于数据问题、超时或编码错误造成的。
You should:你应该:
Common causes can be database locks,.network errors, programming errors and corrupt data.常见原因可能是数据库锁定、网络错误、编程错误和损坏的数据。
It's probably a good idea to setup some sort of monitoring so that somebody investigates more quickly, rather than letting it accumulate to thousands of messages.设置某种监视可能是个好主意,这样有人可以更快地进行调查,而不是让它累积成数千条消息。
The messages moved to DLQ are considered as you said, erroneous.如您所说,移动到 DLQ 的消息被认为是错误的。
If the messages are erroneous due to a bug in the code etc, you should redrive these DLQ messages to source queue once you fixed the bug.如果由于代码中的错误等导致消息错误,您应该在修复错误后将这些 DLQ 消息重新驱动到源队列。 So that they'll have another chance to be reprocessed.
以便他们有另一次机会被重新处理。
It is very unlikely that "temporarly" erroneous messages are moved to DLQ, if you already configured the maxReceiveCount as 3 or more for your source queue.如果您已经将源队列的 maxReceiveCount 配置为 3 或更多,则“临时”错误消息不太可能被移动到 DLQ。 Temporary problems are mostly bypassed with this retry configuration.
使用此重试配置可以绕过大部分临时问题。
And eventually DLQ is also an ordinary SQS queue which retains messages up to 14 days.最终 DLQ 也是一个普通的 SQS 队列,最多可保留消息 14 天。 Even if there are thousands of messages there, they will be gone.
即使那里有成千上万条消息,它们也会消失。 At this point, there are two options:
此时,有两种选择:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.