简体   繁体   English

如何为每个Django模型实例安排定期的Celery任务?

[英]How to schedule a periodic Celery task per Django model instance?

I have a bunch of Feed objects in my database, and I'm trying to get each Feed to be updated every hour. 我的数据库中有一堆Feed对象,并且我试图使每个Feed每小时更新一次。 My issue here is that I need to make sure there aren't any duplicate updates -- it needs to happen no more than once an hour, but I also don't want feeds waiting two hours for an update. 我在这里的问题是,我需要确保没有重复的更新-它每小时最多需要进行一次,但是我也不希望Feed等待两个小时进行更新。 (It's okay if it happens every hour +/- a few minutes, but twice in a few minutes is bad.) (如果每小时每隔+/-分钟发生一次是可以的,但几分钟内发生两次是不好的。)

I'm using Django and Celery with Amazon SQS as a broker. 我将Django和Celery与Amazon SQS用作代理。 I have the feed update code set up as a Celery task, but I'm failing to find a way to prevent duplicates while remaining compatible with Celery running on multiple nodes. 我已将Feed更新代码设置为Celery任务,但是我找不到能够防止重复的方法,同时又与在多个节点上运行的Celery兼容。

My current solution is to add a last_update_scheduled attribute to the Feed model and run the following task every 5 minutes (pseudo-code): 我当前的解决方案是向feed模型添加一个last_update_scheduled属性,并每5分钟运行一次以下任务(伪代码):

threshold = datetime.now() - timedelta(seconds=3600)
for f in Feed.objects.filter(Q(last_update_scheduled__lt = threshold) |
                             Q(last_update_scheduled = None)):
    updateFeed.delay(f)
    f.last_update_scheduled = now
    f.save()

This is susceptible to a number of synchronization issues. 这容易受到许多同步问题的影响。 For example, if my task queues get backed up, this task could run twice at the same time, causing duplicate updates. 例如,如果备份了我的任务队列,则该任务可能同时运行两次,从而导致重复更新。 I've seen some solutions for this (like Celery's recipe and an adaptation on Stack Overflow ), but the memcached solution isn't reliable, eg duplicates could happen when restarting memcached or if it happens to run out of memory and purge old data. 我已经看到了一些解决方案(例如Celery的配方对Stack Overflow的改编 ),但是memcached解决方案并不可靠,例如,重新启动memcached或内存不足并清除旧数据时可能会发生重复。 Not to mention I'd hate to have to add memcached to my production configuration just for a simple lock. 更不用说我讨厌只为了简单的锁定而将memcached添加到生产配置中。

In a perfect world, I'd like to be able to say: 在理想的世界中,我想说:

@modelTask(Feed, run_every=3600)
def updateFeed(feed):
    # do something expensive

But so far my imagination fails me on how to implement that decorator. 但是到目前为止,我的想象力使我无法实现该装饰器。

To be clear, the Celery recipe is not using memcached per se, but rather Django's caching middleware. 需要明确的是,Celery食谱本身并没有使用memcached,而是使用了Django的缓存中间件。 There are a number of other caching methods that would suit your needs without the downside of memcached. 还有许多其他缓存方法可以满足您的需求,而不会带来memcached的不利影响。 See the Django caching documentation for details. 有关详细信息,请参见Django缓存文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM