简体   繁体   中英

Multi-threading / Parallel Processing

I have few hundreds of input files that I need to pass to a function to calculate some numbers, and write them to an output file. The function does not return any value. So, each function call is independent.

Instead of calling the function serially, I tried multiprocessing, and the performance (execution time) is not much better. Any suggestions on improving the performance is valuable. Is even multiprocessing the way to go for this problem?

import multiprocessing as mp
NumProcess = 4

def Analysis(InputFile):
    #do some calcs
    #Write results to output file
    #return nothing

FileList = ['InputFile1.csv','InputFile2.csv','InputFile3.csv',....]
pool = mp.Pool(processes=NumProcess)
temp = [pool.apply_async(Analysis, args=(File) for File in FileList]
output = [p.get() for p in temp]
pool.close()

Does the mutiprocssing call works at all? There is an error on this line - as you don't pass the File argument as a tuple at all.

Corrected version (note the trailing comma to ensure args is a tuple):

temp = [pool.apply_async(Analysis, args=(File,)) for File in FileList]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM