简体   繁体   中英

Trigger Lambda by number of SQS messages

I have a SQS which will receive a huge number of messages. The messages keep coming to the queue.

And I have a use case where if the number of messages in a queue reaches X number (such as 1,000), the system needs to trigger an event to process 1,000 at a time.

And the system will make a chunk of triggers. Each trigger has a thousand messages.

For example, if we have 2300 messages in a queue, we expect 3 triggers to a lambda function, the first 2 triggers corresponding to 1,000 messages, and the last one will contain 300 messages.

I'm researching and see CloudWatch Alarm can hook up to SQS metric on "NumberOfMessageReceived" to send to SNS. But I don't know how can I configure a chunk of alarms for each 1,000 messages.

Please advice me if AWS can support this use case or any customize we can make to achieve this.

在此处输入图像描述

So after going through some clarifications on the comments section with the OP, here's my answer (combined with @ChrisPollard's comment):

Achieving what you want with SQS is impossible, because every batch can only contain up to 10 messages. Since you need to process 1000 messages at once, this is definitely a no-go.

@ChrisPollard suggested to create a new record in DynamoDB every time a new file is pushed to S3. This is a very good approach. Increment the partition key by 1 every time and trigger a lambda through DynamoDB Streams . On your function, run a check against your partition key and, if it equals 1000, you run a query against your DynamoDB table filtering the last 1000 updated items (you'll need a Global Secondary Index on your CreatedAt field). Map these items (or use Projections ) to create a very minimal JSON that contains only the necessary information. Something like:

[
    {
     "key": "my-amazing-key",
     "bucket": "my-super-cool-bucket"
    },
    ...
]

A JSON like this is only 87 bytes long (if you take the square brackets out of the game because they won't be repeated, you're left out with 83 bytes). If you round it up to 100 bytes, you can still successfully send it as one event to SQS, as it will only be around 100KB of data.

Then have one Lambda function subscribe to your SQS queue and then finally concatenate the 1 thousand files.

Things to keep in mind:

  1. Make sure you really create the createdAt field in DynamoDB. By the time it hits one thousand, new items could have been inserted, so this way you make sure you are reading the 1000 items that you expected.

  2. On your Lambda check, just run batchId % 1000 = 0, this way you don't need to delete anything, saving DynamoDB operations.

  3. Watch out for the execution time of your Lambda. Concatenating 1000 files at once may take a while to run, so I'd run a couple of tests and put 1 min overhead on top of it. Ie, if it usually takes 5 mins, set your function's timeout to 6 mins.

If you have new info to share I am happy to edit my answer.

You can add alarms at 1k, 2k, 3k, etc...but that seems clunky.

Is there a reason you're letting the messages batch up? You can make this trigger event-based (when a queue message is added fire my lambda) and get rid of the complications of batching them.

I handled a very similar situation recently, process-A puts objects in an S3 bucket and every time it does it puts a message in the SQS, with the key and bucket details, I have a lambda which is triggered every hour, but it can be any trigger like your cloud watch alarm. Here is what you can do on every trigger:

  • Read the messages from the queue, SQS allows you to read only 10 messages at a time, and every time you read the messages, keep adding them to some list in your lambda, you also get a receipt handle for every message , you can use it to delete the messages and repeat this process until you read all 1000 messages in your queue. Now you can perform whatever operations are required on your list and feed it to process B in a number of different ways , like a file in S3 and/or a new queue that process B can read from.

  • Alternate approach to reading messages: SQS allows you to read only 10 messages at a time, you can send an optional parameter 'VisibilityTimeout':60 that hides the messages from the queue for 60 seconds and you can recursively read all the messages until you dont see any messages in the queue, all while adding them to a list in lambda to process them, this can be tricky since you have to try out different values for visibility time out based on how long it takes to read 1000 messages. Once you know you read all the messages, you can simply have the receipt handles and delete all of them. You can also purge the queue but , you may delete some of the messages that came in during this process that are not read at least once.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM