使用 joblib 和 SLURM 在 Python 中并行化 for 循环

Question

I have a list of 100 tuples tuplelist that serve as inputs to an external function.我有一个包含 100 个元组tuplelist的列表，它们用作外部函数的输入。 The external function returns a value, and that value is appended to an array, like so ( MainFile.py ):外部函数返回一个值，并将该值附加到一个数组中，就像这样 ( MainFile.py )：

from ExternalPythonFile import ExternalFunction

valuelist = []
for a,b in tuplelist:
    value = ExternalFunction(a,b)
    # more functions here
    valuelist.append(value)
print(len(valuelist))

The output for print(len(valuelist)) when using the for-loop above is (100,) .使用上面的 for 循环时print(len(valuelist))的输出是(100,) 。

Now since the order of the tuples and how they are appended do not matter in my case, I wanted to parallelize the for-loop since it would take ~10 min to process 100 tuples, and I'm expecting to scale that number.现在，由于元组的顺序以及它们的附加方式在我的情况下并不重要，我想并行化 for 循环，因为处理 100 个元组需要大约 10 分钟，并且我希望扩展该数字。 I have tried a joblib implementation below ( MainFileJoblib.py ):我在下面尝试了一个 joblib 实现（ MainFileJoblib.py ）：

from ExternalPythonFile import ExternalFunction
from joblib import Parallel, delayed, parallel_backend
import multiprocessing

valuelist = []

def TupleFunction(a,b):
        value = ExternalFunction(a,b)
        # more functions here
        valuelist.append(value)

with parallel_backend('multiprocessing'):
    Parallel(n_jobs=10)(delayed(TupleFunction)(a,b) for a,b in tuplelist)

print(len(valuelist))

I'm running all of this on a unix compute cluster, but the runtime was still similar at ~8 min.我在 unix 计算集群上运行所有这些，但运行时间仍然相似，大约 8 分钟。 The output was also wrong, it printed (0,) .输出也是错误的，它打印了(0,) 。

Looking at htop I found that there were in fact 10 cores being used but each core only at 20% usage.查看htop我发现实际上有 10 个内核被使用，但每个内核只有 20% 的使用率。

I also have tried to run the joblib implementation via SLURM:我还尝试通过 SLURM 运行 joblib 实现：

srun --ntasks=1 --ncpus-per-task=10 python3 MainFileJoblib.py

which was definitely faster at around ~2 min, but again it gave the wrong result (0,) .这在大约 2 分钟左右肯定更快，但它再次给出了错误的结果(0,) 。

What's the best way to parallelize the original for-loop?并行化原始 for 循环的最佳方法是什么？

Answer 1

Joblib manages by itself the creation and population of the output list, so the code can be easily fixed with: Joblib 自行管理输出列表的创建和填充，因此可以轻松修复代码：

from ExternalPythonFile import ExternalFunction
from joblib import Parallel, delayed, parallel_backend
import multiprocessing


with parallel_backend('multiprocessing'):
    valuelist = Parallel(n_jobs=10)(delayed(ExternalFunction)(a, b) for a, b in tuplelist)

print(len(valuelist))

If for some reason you need to update an array-like object, you could make use of numpy memmap, as per the following minimal example:如果由于某种原因你需要更新一个类似数组的对象，你可以使用 numpy memmap，按照下面的最小示例：

import tempfile
import numpy as np
from ExternalPythonFile import ExternalFunction
from joblib import Parallel, delayed, parallel_backend
import multiprocessing


# define function to update your array
def fill_array(mm_file, i, tuple_val):
    a, b = tuple_val
    value = ExternalFunction(a, b)
    # more functions here
    mm_file[i] = value

# create a temporary folder
tmp_dir = tempfile.mkdtemp()
# create a file where to dump your array
values_fname_memmap = Path(tmp_dir).joinpath("values_memmap")
values_memmap = np.memmap(values_fname_memmap.as_posix(),
                          dtype=np.float,
                          shape=(len(tuplelist), ),
                          mode='w+')

with parallel_backend('multiprocessing'):
    Parallel(n_jobs=10)(delayed(fill_array)(values_memmap, i, ab) 
                        for i, ab in enumerate(tuplelist))

print(len(values_memmap))

If you need to apply a set of transformations to the value ( # more functions ) just make a wrapper around ExternalFunction that outputs the desired value for a given tuple (a, b).如果您需要对值应用一组转换（ # more functions ），只需围绕 ExternalFunction 进行包装，为给定的元组 (a, b) 输出所需的值。

I hope that despite the late response it may still be useful to you.我希望尽管回复晚了，它仍然对你有用。

使用 joblib 和 SLURM 在 Python 中并行化 for 循环

问题描述

1 个解决方案

解决方案1
5 已采纳 2020-04-11 16:26:03

使用 joblib 和 SLURM 在 Python 中并行化 for 循环

问题描述

1 个解决方案

解决方案1 5 已采纳 2020-04-11 16:26:03

解决方案1
5 已采纳 2020-04-11 16:26:03