简体   繁体   中英

Parallelizing a CPU-bound Python function

I have a CPU-bound Python function that takes around 15 seconds to run on a standard core. I need to run this function tens of thousands of times. The function input is a dataset around 10kB in size, so data transfer time should be negligible compared to the runtime. The functions do not need to communicate with each other. The return value is a small array.

I do not need to synchronize these functions at all. All I care about is that when one core finishes, it gets delegated a new job.

What is a good framework to start parallelizing this problem with? I would like to be able to run this on my own computers and also Amazon units.

Would Python's multiprocessing module do the trick? Would I be better off with something other than that?

if no communication needed - simplest way is Pool.map. It like map function, but iterations processed in one of child process.

import multiprocessing
pool = multiprocessing.Pool(processes=4)
def fu(chunk):
    #your code here
    return result

def produce_data(data):
    while data:
        #you need to split data
        yield chunk

result = pool.map(fu,produce_data(data))
# result will be ordered list of results for each chunk

There is few several ways to process data with multiprocessing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM