I have a list of 100 tuples tuplelist
that serve as inputs to an external function. The external function returns a value, and that value is appended to an array, like so ( MainFile.py
):
from ExternalPythonFile import ExternalFunction
valuelist = []
for a,b in tuplelist:
value = ExternalFunction(a,b)
# more functions here
valuelist.append(value)
print(len(valuelist))
The output for print(len(valuelist))
when using the for-loop above is (100,)
.
Now since the order of the tuples and how they are appended do not matter in my case, I wanted to parallelize the for-loop since it would take ~10 min to process 100 tuples, and I'm expecting to scale that number. I have tried a joblib implementation below ( MainFileJoblib.py
):
from ExternalPythonFile import ExternalFunction
from joblib import Parallel, delayed, parallel_backend
import multiprocessing
valuelist = []
def TupleFunction(a,b):
value = ExternalFunction(a,b)
# more functions here
valuelist.append(value)
with parallel_backend('multiprocessing'):
Parallel(n_jobs=10)(delayed(TupleFunction)(a,b) for a,b in tuplelist)
print(len(valuelist))
I'm running all of this on a unix compute cluster, but the runtime was still similar at ~8 min. The output was also wrong, it printed (0,)
.
Looking at htop
I found that there were in fact 10 cores being used but each core only at 20% usage.
I also have tried to run the joblib implementation via SLURM:
srun --ntasks=1 --ncpus-per-task=10 python3 MainFileJoblib.py
which was definitely faster at around ~2 min, but again it gave the wrong result (0,)
.
What's the best way to parallelize the original for-loop?
Joblib manages by itself the creation and population of the output list, so the code can be easily fixed with:
from ExternalPythonFile import ExternalFunction
from joblib import Parallel, delayed, parallel_backend
import multiprocessing
with parallel_backend('multiprocessing'):
valuelist = Parallel(n_jobs=10)(delayed(ExternalFunction)(a, b) for a, b in tuplelist)
print(len(valuelist))
If for some reason you need to update an array-like object, you could make use of numpy memmap, as per the following minimal example:
import tempfile
import numpy as np
from ExternalPythonFile import ExternalFunction
from joblib import Parallel, delayed, parallel_backend
import multiprocessing
# define function to update your array
def fill_array(mm_file, i, tuple_val):
a, b = tuple_val
value = ExternalFunction(a, b)
# more functions here
mm_file[i] = value
# create a temporary folder
tmp_dir = tempfile.mkdtemp()
# create a file where to dump your array
values_fname_memmap = Path(tmp_dir).joinpath("values_memmap")
values_memmap = np.memmap(values_fname_memmap.as_posix(),
dtype=np.float,
shape=(len(tuplelist), ),
mode='w+')
with parallel_backend('multiprocessing'):
Parallel(n_jobs=10)(delayed(fill_array)(values_memmap, i, ab)
for i, ab in enumerate(tuplelist))
print(len(values_memmap))
If you need to apply a set of transformations to the value ( # more functions ) just make a wrapper around ExternalFunction that outputs the desired value for a given tuple (a, b).
I hope that despite the late response it may still be useful to you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.