简体   繁体   中英

How to rate limit Celery tasks by task name?

I'm using Celery to process asynchronous tasks from a Django app. Most tasks are short and run in a few seconds, but I have one task that can take a few hours.

Due to processing restrictions on my server, Celery is configured to only run 2 tasks at once. That means if someone launches two of these long-running tasks, it effectively blocks all other Celery processing site wide for several hours, which is very bad.

Is there any way to configure Celery so it only processes one type of task no more than one at a time? Something like:

@task(max_running_instances=1)
def my_really_long_task():
    for i in range(1000000000):
        time.sleep(6000)

Note, I don't want to cancel all other launches of my_really_long_task . I just don't want them to start right away, and only begin once all other tasks of the same name finish.

Since this doesn't seem to be supported by Celery, my current hacky solution is to query other tasks within the task, and if we find other running instances, then reschedule ourselves to run later, eg

from celery.task.control import inspect

def get_all_active_celery_task_names(ignore_id=None):
    """
    Returns Celery task names for all running tasks.
    """
    i = inspect()
    task_names = defaultdict(int) # {name: count}
    if i:
        active = i.active()
        if active is not None:
            for worker_name, tasks in i.active().iteritems():
                for task in tasks:
                    if ignore_id and task['id'] == ignore_id:
                        continue
                    task_names[task['name']] += 1
    return task_names

@task
def my_really_long_task():

    all_names = get_all_active_celery_task_names()
    if 'my_really_long_task' in all_names:
        my_really_long_task.retry(max_retries=100, countdown=random.randint(10, 300))
        return

    for i in range(1000000000):
        time.sleep(6000)

Is there a better way to do this?

I'm aware of other hacky solutions like this , but setting up a separate memcache server to track task uniqueness is even less reliable, and more complicated than the method I use above.

An alternate solution is to queue my_really_long_task into a seperate queue.

 my_really_long_task.apply_async(*args, queue='foo')

Then start a worker with a concurrency of 1 to consume these tasks so that only 1 task gets executed at a time.

celery -A foo worker -l info -Q foo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM