I have a folder containing multiple datasets and I want to run a model over these datasets and distribute the load across multiple cores, hopefully, to increase the overall run time of the data processing.
My computer has 8 cores. This was my first attempt below, it's only really a sketch but using htop
, I can see that only 1 core is being employed for this job. Multi-core newbie here.
import pandas as pd
import multiprocessing
import os
from library_example import model_example
def worker(file_):
to_save = pd.Series()
with open(file_,'r') as f_open:
data = f_open.read()
# Run model
model_results = model_example(file_)
# Save results in DataFrame
to_save.to_csv(file_[:-4]+ "_results.csv", model_results )
file_location_ = "/home/datafiles/"
if __name__ == '__main__':
for filename in os.listdir(file_location_):
p = multiprocessing.Process(target=worker, args=(file_location_ + filename,))
p.start()
p.join()
Try moving out the p.join()
. That will wait for the process to complete which effectively makes this a serial process as you kick off the process (ie start
) and then wait for each one (ie join
). Instead you can try something like this:
# construct the workers
workers = [multiprocessing.Process(target=worker, args=(file_location_ + filename,)) for filename in os.listdir(file_location_)]
# start them
for proc in workers:
proc.start()
# now we wait for them
for proc in workers:
proc.join()
(I didn't try running this in your code but something like that should work.)
EDIT If you want to limit the number of workers/processes then I'd recommend just using a Pool
. You can specify how many processes to use and then map(..)
the arguments to those processes. Example:
# construct a pool of workers
pool = multiprocessing.Pool(6)
pool.map(worker, [file_location_ + filename for filename in os.listdir(file_location_)])
pool.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.