I am using Python
with RabbitMQ
and Celery
to distribute tasks to a worker. The tasks take around 15 minutes each and are 99% CPU-bound. My system has 24-cores and whenever my worker executes this task, I get a connection error to the broker.
[2019-10-12 08:49:57,695: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
[...]
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
I found several other posts with this issue, but none of them fixed it. Especially with the heavy CPU load, any idea how I could solve that?
Windows 10 (worker)
macOS 10.14 (RabbitMQ Server)
Python 3.7
Celery 4.3.0 (rhubarb)
RabbitMQ 3.7.16 (Erlang 22.0.7 )
My configuration lets the worker consume only 1 task at a time , and even the worker process is restarted after each job, still no luck:
CELERYD_MAX_TASKS_PER_CHILD = 1,
CELERYD_CONCURRENCY = 1,
CELERY_TASK_RESULT_EXPIRES=3600,
CELERYD_PREFETCH_MULTIPLIER = 1,
CELERY_MAX_CACHED_RESULTS = 1,
CELERY_ACKS_LATE = True,
And this is the entire callstack:
[2019-10-12 08:49:57,695: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\celery\worker\consumer\consumer.py", line 318, in start
blueprint.start(self)
File "C:\Python37\lib\site-packages\celery\bootsteps.py", line 119, in start
step.start(parent)
File "C:\Python37\lib\site-packages\celery\worker\consumer\consumer.py", line 596, in start
c.loop(*c.loop_args())
File "C:\Python37\lib\site-packages\celery\worker\loops.py", line 118, in synloop
qos.update()
File "C:\Python37\lib\site-packages\kombu\common.py", line 442, in update
return self.set(self.value)
File "C:\Python37\lib\site-packages\kombu\common.py", line 435, in set
self.callback(prefetch_count=new_value)
File "C:\Python37\lib\site-packages\celery\worker\consumer\tasks.py", line 47, in set_prefetch_count
apply_global=qos_global,
File "C:\Python37\lib\site-packages\kombu\messaging.py", line 558, in qos
apply_global)
File "C:\Python37\lib\site-packages\amqp\channel.py", line 1853, in basic_qos
wait=spec.Basic.QosOk,
File "C:\Python37\lib\site-packages\amqp\abstract_channel.py", line 68, in send_method
return self.wait(wait, returns_tuple=returns_tuple)
File "C:\Python37\lib\site-packages\amqp\abstract_channel.py", line 88, in wait
self.connection.drain_events(timeout=timeout)
File "C:\Python37\lib\site-packages\amqp\connection.py", line 504, in drain_events
while not self.blocking_read(timeout):
File "C:\Python37\lib\site-packages\amqp\connection.py", line 509, in blocking_read
frame = self.transport.read_frame()
File "C:\Python37\lib\site-packages\amqp\transport.py", line 252, in read_frame
frame_header = read(7, True)
File "C:\Python37\lib\site-packages\amqp\transport.py", line 438, in _read
s = recv(n - len(rbuf))
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
I found a solution for this issue. I feel the issue is with celery backend. In my case I am using redis.
Following is my configuration
Broker - rabbitmq
Backend - redis
Python - 3.7
OS - Windows 10
On celery client side, I tried to ping celery status of worker for every 60 seconds from client side. In this case I didn't face the connection reset issue.
while not doors_res.ready():
sleep(60)
result = app.get()
where app is celery instance.
On celery worker side
celery worker -A <celery_file_name> -l info -P gevent
My task runs for around 2 hours, and I didn't face connection reset error.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.