简体   繁体   中英

Managing SQS Queue Manually vs Lambda Trigger

I'm not sure if this would be better served on ServerFault or Software Engineering, willing to move this post if appropriate.

We have somewhat recently started to move some of our data processing pipeline to use queues to manage individual bits of data, whereas previously we had timed lambdas that would pull all data since last change.

While making this change, we noticed that queues didn't work quite as we had anticipated first of all - we thought lambda would just pull items off the queue as the lambdas had availability. Instead, it seems the aws managed lambda trigger grabs a chunk of messages (up to ten) and throws it at the lambda service. If lambda doesn't have availability, the message gets throttled, then replayed after a backoff time, up til our configured replay "error" limit (five). After that, it's thrown into our dead letter queue.

We see a handful of message per day end up in the dead letter queue as a result of throttling. We then throw these back into the main queue (we have a process to do so every handful of hours). However, we weren't 100% sure throttling was the reason for things being pushed over since nothing indicates why the messages are moved over - we just assumed as much because we weren't getting any error logs for those messages. We contacted Amazon support to ask about this, and they were able to actually confirm the messages were in fact "errored" as a result of throttling.

We asked further into their recommendations for this - this must be a common problem right? They first off suggested upping our replay limit, which seemed an obvious no go. Replays occur for any failure, so that would just hammer our lambdas with bad requests when they came through. Asked also if there's any way to differentiate the errors because we don't care for throttling, we'd happily let those retry a dozen times if needed - but no. The other suggestion they had was to manage the queue ourselves from our lambdas. Build our own code within our lambdas to pull messages and then delete them after processing. This seems really counter-intuitive, though - why would every AWS consumer build their own infrastructure?

So I guess my question is, is this what others are doing? Are you using the built in lambda triggers? Are you creating your own code for managing queue consumption? Do you see these sorts of throttling, or is there anything we could do differently? Are there any difference with other services to manage this?

Best practice is to handle errors in your code and manually delete messages that have succeeded. That allows you to handle poison messages without reprocessing the good messages again. Throttles shouldn't be ending up in a DLQ that often. This video from re:Invent 2020 has a good explaination of how this works. Scalable serverless event-driven architectures with SNS, SQS & Lambda . Start at about the 20 minutes mark to get into SQS error handling.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM