简体   繁体   中英

Celery worker disconnects to broker

I am using Python with RabbitMQ and Celery to distribute tasks to a worker. The tasks take around 15 minutes each and are 99% CPU-bound. My system has 24-cores and whenever my worker executes this task, I get a connection error to the broker.

[2019-10-12 08:49:57,695: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
[...]
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

I found several other posts with this issue, but none of them fixed it. Especially with the heavy CPU load, any idea how I could solve that?

Windows 10 (worker)

macOS 10.14 (RabbitMQ Server)

Python 3.7

Celery 4.3.0 (rhubarb)

RabbitMQ 3.7.16 (Erlang 22.0.7 )

My configuration lets the worker consume only 1 task at a time , and even the worker process is restarted after each job, still no luck:

CELERYD_MAX_TASKS_PER_CHILD = 1,
CELERYD_CONCURRENCY = 1,
CELERY_TASK_RESULT_EXPIRES=3600,
CELERYD_PREFETCH_MULTIPLIER = 1,
CELERY_MAX_CACHED_RESULTS = 1,
CELERY_ACKS_LATE = True,

And this is the entire callstack:

[2019-10-12 08:49:57,695: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\celery\worker\consumer\consumer.py", line 318, in start
    blueprint.start(self)
File "C:\Python37\lib\site-packages\celery\bootsteps.py", line 119, in start
    step.start(parent)
File "C:\Python37\lib\site-packages\celery\worker\consumer\consumer.py", line 596, in start
    c.loop(*c.loop_args())
File "C:\Python37\lib\site-packages\celery\worker\loops.py", line 118, in synloop
    qos.update()
File "C:\Python37\lib\site-packages\kombu\common.py", line 442, in update
    return self.set(self.value)
File "C:\Python37\lib\site-packages\kombu\common.py", line 435, in set
    self.callback(prefetch_count=new_value)
File "C:\Python37\lib\site-packages\celery\worker\consumer\tasks.py", line 47, in set_prefetch_count
    apply_global=qos_global,
File "C:\Python37\lib\site-packages\kombu\messaging.py", line 558, in qos
    apply_global)
File "C:\Python37\lib\site-packages\amqp\channel.py", line 1853, in basic_qos
    wait=spec.Basic.QosOk,
File "C:\Python37\lib\site-packages\amqp\abstract_channel.py", line 68, in send_method
    return self.wait(wait, returns_tuple=returns_tuple)
File "C:\Python37\lib\site-packages\amqp\abstract_channel.py", line 88, in wait
    self.connection.drain_events(timeout=timeout)
File "C:\Python37\lib\site-packages\amqp\connection.py", line 504, in drain_events
    while not self.blocking_read(timeout):
File "C:\Python37\lib\site-packages\amqp\connection.py", line 509, in blocking_read
    frame = self.transport.read_frame()
File "C:\Python37\lib\site-packages\amqp\transport.py", line 252, in read_frame
    frame_header = read(7, True)
File "C:\Python37\lib\site-packages\amqp\transport.py", line 438, in _read
    s = recv(n - len(rbuf))
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

I found a solution for this issue. I feel the issue is with celery backend. In my case I am using redis.

Following is my configuration

Broker - rabbitmq
Backend - redis
Python - 3.7
OS - Windows 10

On celery client side, I tried to ping celery status of worker for every 60 seconds from client side. In this case I didn't face the connection reset issue.

while not doors_res.ready():
    sleep(60)
result = app.get()

where app is celery instance.

On celery worker side

celery worker -A <celery_file_name> -l info -P gevent

My task runs for around 2 hours, and I didn't face connection reset error.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM