简体   繁体   中英

How to implement a multiprocessing pool using Celery

In python multiprocessing, am able to create a multiprocessing pool of say 30 processes to process some long running equation on some IDs. The below code spawns 30 processes on an 8 core machine and the load_average never exceeds 2.0. In fact, the 30 consumers is a limit given that the server where the postgresql database that hosts the IDs has got 32 cores so I know I can spawn more processes if my database could handle it.

from multiprocessing import Pool
number_of_consumers = 30
pool = Pool(number_of_consumers)

I have taken the time to setup Celery but am unable to recreate the 30 processes. I thought setting the concurrency eg -c 30 would create 30 processes but if am not wrong that means I have 32 processors which I intend to use which is wrong as I only have 8! Also, am seeing the load_average hitting 10.0 on an 8 core machine which is bad..

[program:my_app]
command = /opt/apps/venv/my_app/bin/celery -A celery_conf.celeryapp worker -Q app_queue -n app_worker --concurrency=30 -l info

So, when using Celery, how can I recreate my 30 processes on a 8 core machine?

Edit: Qualifying the Confusion

I thought I'd attach an image to illustrate my confusion on server load when discussing Celery and Python Multiprocessing. The server am using has 8 cores. Using Python Multiprocessing and spawning 30 processes, the load average as seen in the attached diagram is at 0.22 meaning -if my linux knowledge serves me right- that my script is using one core to spawn the 30 processes hence a very low load_average.

使用python多处理截图的load_average

My understanding of the --concurrency=30 option in celery is that it instructs Celery how many cores it will use rather than how many processes it is required to spawn. Am I right on that? Is there a way to instruct Celery to use 2 cores and for each core spawn 15 processes giving me a total of 30 concurrent processes so that my server load remains low?

A Celery worker consists of:

  1. Message consumer
  2. Worker Pool

The message consumer fetches the tasks from the broker and sends them to the workers in the pool.

The --concurrency or -c argument specifies the number processes in that pool, so if you're using the prefork pool which is the default then you already have 30 processes in the pool using --concurrency=30 , you can check by looking at the worker output when it starts, it should have something like:

concurrency: 30 (prefork)

A note from the docs on concurrency :

Number of processes (multiprocessing/prefork pool)

More pool processes are usually better, but there's a cut-off point where adding more pool processes affects performance in negative ways. There is even some evidence to support that having multiple worker instances running, may perform better than having a single worker. For example 3 workers with 10 pool processes each. You need to experiment to find the numbers that works best for you, as this varies based on application, work load, task run times and other factors.

If you want to start multiple worker instances you should look at celery multi , or start them manually using celery worker .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM