简体   繁体   中英

Celery SQS + Duplication of tasks + SQS visibility timeout

Most of my Celery tasks have ETA longer then maximal visibility timeout defined by Amazon SQS.

Celery documentation says:

This causes problems with ETA/countdown/retry tasks where the time to execute exceeds the visibility timeout; in fact if that happens it will be executed again, and again in a loop.

So you have to increase the visibility timeout to match the time of the longest ETA you're planning to use.

At the same time it also says that:

The maximum visibility timeout supported by AWS as of this writing is 12 hours (43200 seconds):

What should I do to avoid multiple execution of tasks in my workers if I am using SQS?

Generally its not a good idea to have tasks with very long ETAs.

First of all, there is the "visibility_timeout" issue. And you probably dont want a very big visibility timeout because if the worker crashes 1 min before the task is about to run, then the Queue will still wait for the visibility_timeout to finish before sending the task to another worker and, I guess you dont want this to be another 1 month.

From celery docs:

Note that Celery will redeliver messages at worker shutdown, so having a long visibility timeout will only delay the redelivery of 'lost' tasks in the event of a power failure or forcefully terminated workers.

And also, SQS allows only so many tasks to be in the list to be ack'ed.

SQS calls these tasks as "Inflight Messages". From http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-visibility-timeout.html :

A message is considered to be in flight after it's received from a queue by a consumer, but not yet deleted from the queue.

For standard queues, there can be a maximum of 120,000 inflight messages per queue. If you reach this limit, Amazon SQS returns the OverLimit error message. To avoid reaching the limit, you should delete messages from the queue after they're processed. You can also increase the number of queues you use to process your messages.

For FIFO queues, there can be a maximum of 20,000 inflight messages per queue. If you reach this limit, Amazon SQS returns no error messages.

I see two possible solutions, you can either use RabbitMQ instead, which doesnt rely on visibility timeouts (there are "RabbitMQ as a service" services if you dont want to manage your own) or change your code to have really small ETAs (best practice)

These are my 2 cents, maybe @asksol can provide some extra insights.

Celery is know for async task scheduler. It really doesn't matter with the count of task. If you send the task to queue, celery will perform the task, until there is an error in the code. You have to check or restrict the duplicate task before sending the task to queue.

In SQS you can change the visibility time out of the message. It is documented here . So what you have to do is this, when you are processing the message, you can keep updating the visibility time out regularly and once you are done you can delete the message.

To extend the visibility time out regularly, if you are using some loop, you can keep extending the timeout at the end of each iteration or every x number of iterations depending on the time to complete one iteration. Here is a sample code for doing what I mean.

process_message(){
  for(i=0;i++;..){
    .
    .
    .
    if(i%5 == 0){
     extendVisibilityTimeOut(..)
    }
  }
}

Most of my Celery tasks have ETA longer then maximal visibility timeout defined by Amazon SQS.

Celery documentation says:

This causes problems with ETA/countdown/retry tasks where the time to execute exceeds the visibility timeout; in fact if that happens it will be executed again, and again in a loop.

So you have to increase the visibility timeout to match the time of the longest ETA you're planning to use.

At the same time it also says that:

The maximum visibility timeout supported by AWS as of this writing is 12 hours (43200 seconds):

What should I do to avoid multiple execution of tasks in my workers if I am using SQS?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM