简体   繁体   中英

python how to multiprocess with thousands of processes only running a few at a time

In my multiprocessing flow I will have to start thousands of processes but I am afraid that if I start them all at the same time the server will run out of memory.

I tried to read the python docs, but I was left a little confused as to how to achieve this...

to be clear about what I want to accomplish....

  1. create many processes
  2. set a certain amount (x) to run at one time.
  3. have python start (x) processes and as they finish, start more of the created processes to a maximum of (x) running processes

From my (limited) understanding of you question, I think that you're probably equating jobs and processes. In other words, that you need x processes to run x jobs, even if you only want n processes running at once (with n < x). That's what Pools are for.

https://docs.python.org/3.7/library/multiprocessing.html#using-a-pool-of-workers

Basically you set up a pool of worker processes like so (the code sample is a simplified version of the doc):

from multiprocessing import Pool

def f(x):
    return x*x

with Pool(processes=4) as pool:
    result = pool.apply_async(f, (20,))
        print(res.get(timeout=1))

This way you only ever start a maximum of 4 worker processes (or n, as your case may be) and you can send an unlimited amount of tasks to them. The Pool will dispatch the tasks accordingly as soon as a worker is free.

EDIT : as mentioned in the comments by Olvin Roght there is also https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor

In my own experience, it depends on what you're trying to achieve. If you just want to grasp multiprocessing better, the regular pool is fine. It's also a lower level of abstraction so it will force you to really understand what you're doing. I learned that way and don't regret it. If it's just to get the job done, the concurrent.futures version is better. Higher level of abstraction and a very nice API.

EDIT 2 - About blocking get() and timeouts.

from multiprocessing import Pool, TimeoutError
import time


def f(x):
    time.sleep(x)
    return "I'm rested now"


if __name__ == '__main__':
    # start 4 worker processes
    with Pool(processes=4) as pool:
        res = pool.apply_async(f, (10,))  # run in one process
        while True:
            try:
                print(res.get(timeout=1))
            except TimeoutError:
                print("No result yet")
            else:
                break

As you can see I have a function that sleeps for a number of seconds, I give it to the pool to execute with a value of 10 (meaning that function will sleep for 10 seconds within the worker process before returning its result... this simulates work taking time).

In the main process, I try to get the result with a timeout of 1s, meaning after 1s, stop trying and print that no result was found (yet) (I do this by catching the TimoutError thrown by get). All of this in a loop so that when the worker process eventually finishes I get the result. As you can see, the reason your worker process terminated is because your main process terminated on the TimoutError left uncaught (that's something to keep in mind, these workers' lifespan is tied to the main one).

PS: Share some code and give me a better understanding of what you're trying to achieve if you need more help. I can barely estimate your understanding of these concepts, let alone guess what you're trying to achieve in-fine. It makes it hard to be helpful. What I mean is, if all you want is to understand multiprocessing concepts, this is all fine, and I think the doc on multiprocessing and asking questions on SO will get you there, but if you actually are trying to get some work done, there are higher level libs that will do most of this heavy lifting for you (celery come to mind, but it's not the only one).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM