简体   繁体   中英

How to schedule a periodic Celery task per Django model instance?

I have a bunch of Feed objects in my database, and I'm trying to get each Feed to be updated every hour. My issue here is that I need to make sure there aren't any duplicate updates -- it needs to happen no more than once an hour, but I also don't want feeds waiting two hours for an update. (It's okay if it happens every hour +/- a few minutes, but twice in a few minutes is bad.)

I'm using Django and Celery with Amazon SQS as a broker. I have the feed update code set up as a Celery task, but I'm failing to find a way to prevent duplicates while remaining compatible with Celery running on multiple nodes.

My current solution is to add a last_update_scheduled attribute to the Feed model and run the following task every 5 minutes (pseudo-code):

threshold = datetime.now() - timedelta(seconds=3600)
for f in Feed.objects.filter(Q(last_update_scheduled__lt = threshold) |
                             Q(last_update_scheduled = None)):
    updateFeed.delay(f)
    f.last_update_scheduled = now
    f.save()

This is susceptible to a number of synchronization issues. For example, if my task queues get backed up, this task could run twice at the same time, causing duplicate updates. I've seen some solutions for this (like Celery's recipe and an adaptation on Stack Overflow ), but the memcached solution isn't reliable, eg duplicates could happen when restarting memcached or if it happens to run out of memory and purge old data. Not to mention I'd hate to have to add memcached to my production configuration just for a simple lock.

In a perfect world, I'd like to be able to say:

@modelTask(Feed, run_every=3600)
def updateFeed(feed):
    # do something expensive

But so far my imagination fails me on how to implement that decorator.

To be clear, the Celery recipe is not using memcached per se, but rather Django's caching middleware. There are a number of other caching methods that would suit your needs without the downside of memcached. See the Django caching documentation for details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM