简体   繁体   English

Celery 关闭worker并重试任务

[英]Celery shutdown worker and retry the task

I need to implement the following logic in celery task: if some condition is met, shutdown the current worker and retry the task.我需要在 celery 任务中实现以下逻辑:如果满足某些条件,则关闭当前工作人员并重试该任务。

Tested on the sample task:在示例任务上测试:

@app.task(bind=True, max_retries=1)
def shutdown_and_retry(self, config):
    try:
        raise Exception('test exection')
    except Exception as exc:
        print('Retry {}/{}, task id {}'.format(self.request.retries, self.max_retries, self.request.id))
        app.control.shutdown(destination=[self.request.hostname])  # send shutdown signal to the current worker
        raise self.retry(exc=exc, countdown=5)
    print('Execute task id={} retries={}'.format(self.request.id, self.request.retries))
    return 'some result'

But it give strange results, steps:但它给出了奇怪的结果,步骤:

  1. Run worker: celery worker -Q test_queue -A test_worker -E -c 1 -n test_worker_1 .运行 worker: celery worker -Q test_queue -A test_worker -E -c 1 -n test_worker_1
  2. Push task to the "test_queue" queue.将任务推送到“test_queue”队列。
  3. Worker caught it and shutdown.工人抓住它并关闭。 I opened the list of tasks in 'test_queue' in RabbitMQ and saw:我在 RabbitMQ 中打开“test_queue”中的任务列表,看到:
    • Original task submitted by publisher, retries = 0 (comes from app.control.shutdown() call);发布者提交的原始任务,重试次数 = 0(来自 app.control.shutdown() 调用);
    • Copy of original task (with same id), retries = 1 (comes from self.retry() call).原始任务的副本(具有相同的 ID),重试次数 = 1(来自 self.retry() 调用)。
  4. Then I started another worker to the same queue, it caught the task and shutdown also.然后我在同一个队列中启动了另一个工作人员,它也捕获了任务并关闭了。 But on Broker one more copy of the original task was pushed to the queue with the same id and retries = 1. So, I had 3 tasks in the queue.但是在 Broker 上,原始任务的另一个副本被推送到具有相同 ID 和重试次数 = 1 的队列中。因此,队列中有 3 个任务。 All next runs of workers gave + 1 new task to queue.所有接下来的工作人员都将 + 1 个新任务放入队列。 Condition max_retries = 1 hasn't worked in this case.条件 max_retries = 1 在这种情况下不起作用。

What I have tried:我试过的:

  1. Set task_reject_on_worker_lost = True in celeryconfig.py and run the same task.在 celeryconfig.py 中设置task_reject_on_worker_lost = True并运行相同的任务。 Result: nothing changed.结果:什么都没有改变。
  2. Leave only shutdown call in worker's task.在工作人员的任务中只留下关机呼叫。 Result: only original task is pushed back on each try (there is no tasks duplication), but it didn't count retries (always set to 0);结果:每次尝试仅推回原始任务(没有任务重复),但不计算重试次数(始终设置为 0);
  3. Add app.control.revoke(self.request.id) before shutdown and retry calls in the worker (based on this ).在关闭之前添加app.control.revoke(self.request.id)并在 worker 中重试调用(基于this )。 Result: after the first try got the same (2 tasks in queue), but when I run second worker queue flushed and it didn't run anything.结果:在第一次尝试后得到相同的结果(队列中有 2 个任务),但是当我运行第二个工作队列时,它被刷新并且它没有运行任何东西。 So, the task is lost and not retried.因此,任务丢失并且不会重试。

Is there a way to do not push back the original task to queue during app.control.shutdown() call?有没有办法在app.control.shutdown()调用期间不将原始任务推回队列? It seems that this is the root cause.看来这才是根本原因。 Or could you please suggest another workaround which will allow to implement the right logic pointed above.或者您能否建议另一种解决方法,该方法可以实现上面指出的正确逻辑。

Setup: RabbitMQ 3.8.2, celery 4.1.0, python 3.5.4设置:RabbitMQ 3.8.2、celery 4.1.0、python 3.5.4

Settings in celeryconfig.py: celeryconfig.py 中的设置:

task_acks_late = True
task_acks_on_failure_or_timeout = True
task_reject_on_worker_lost = False
task_track_started = True
worker_prefetch_multiplier = 1
worker_disable_rate_limits = True

It looks like the issue is task_acks_late in your configuration file.看起来问题是配置文件中的task_acks_late By using that, you are saying "Only remove the task from the queue when I have finished running".通过使用它,您说的是“仅在我完成运行后才从队列中删除任务”。 You then kill the worker, so it is never acknowledged (and you get duplicates of the task).然后你杀了工人,所以它永远不会被确认(你会得到任务的副本)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM