简体   繁体   中英

Multiprocessing in python. Can a multiprocessed function call functions as multiprocesses?

Recently I have started using the multiprocessor pool executor in python to accelerate my processing.

So instead of doing a

list_of_res=[]
for n in range(a_number):
    res=calculate_something(list_of sources[n])
    list_of_res.append(res)
joint_results=pd.concat(list_of_res)

I do

with ProcessPoolExecutor(max_workers=8) as executor:
    joint_results=pd.concat(executor.map(calculate_something,list_of_sources))

It works great.

However I've noticed that inside the calculate_something function I call the same function like 8 times, one after another, so I might as well apply a map to them instead of a loop

My question is, can I apply multiprocessing to a function that is already being called in multiprocess?

yes you can have a worker process spawn another pool of workers, but it is not optimal.

each time you launch a new process it takes a few hundred milliseconds to a few seconds for this new process to initialize and start executing work (OS, disk and code dependent.)

launching a worker from a worker is just wasting the overhead of spawning the first child to begin with, and you are better off extracting the loop inside calculate_something and launching it directly within your initial executor.

a better approach is to launch your initial calculate_something using a ThreadPoolExecutor and have one shared ProcessPoolExecutor that all your thread workers will push work into, this way you can limit the number of newly created processes and avoid creating and deleting much more workers than you actually need, and it takes only a few microseconds to launch a threadpool.

this is an example of how to nest threadpool and process_pool.

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def process_worker(n):
    print(n)
    return n

def thread_worker(list_of_n,process_pool:ProcessPoolExecutor):
    work_done = list(process_pool.map(process_worker,list_of_n))
    return work_done

if __name__ == "__main__":
    list_of_lists_of_n = [[1,2,3],[4,5,6]]
    with ProcessPoolExecutor() as process_pool, ThreadPoolExecutor() as threadpool:
        tasks = []
        work_done = []
        for item in list_of_lists_of_n:
            tasks.append(threadpool.submit(thread_worker,item,process_pool))
        for item in tasks:
            work_done.append(item.result())
    print(work_done)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM