HPC上的mpi4py：comm.gather

Question

My aim is to iterate through each element of a large 2D array ( data ) and do some heavy processing on each respective element. 我的目标是遍历大型2D数组（ data ）的每个元素，并对每个元素进行一些繁重的处理。 Therefore I want to use multiple MPIs to each take a portion of the array to work on. 因此，我想使用多个MPI来占用阵列的一部分进行处理。 I am having the problem where I don't know how to exactly write the code to gather all the data together at the end. 我遇到的问题是我不知道如何准确地编写代码以最后收集所有数据。 Here is some example code: 这是一些示例代码：

import numpy as np
import math
from mpi4py import MPI

M = 400
N = 300
data = np.random.rand(M,N)
result_a = np.zeros((M,N))
result_b = np.zeros((M,N))

def process_function(data):
    a = data**2
    b = data**0.5
    return a,b

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
minimum = 0
maximum = int(M*N)
perrank = maximum//size

for index in range(minimum + rank*perrank, minimum + (rank+1)*perrank):
    i = int(math.floor(index/N))
    j = int(index % N)

    a,b = process_function(data[i,j])
    result_a[i,j] = a
    result_b[i,j] = b

a_gath = comm.gather(result_a, root=0)
b_gath = comm.gather(result_b, root=0)

print(np.shape(a_gath))
print('---')
print(np.shape(b_gath))

Unfortunately, for my real problem, when I save both a_gath and b_gath to disk (as a pickle), they only contain a single occurrence of () (ie type None ) when I re-load them. 不幸的是，对于我真正的问题，当我将a_gath和b_gath都保存到磁盘（作为泡菜）时，当我重新加载它们时，它们仅包含一次出现的() （即，键入None ）。 Is there something else I should be doing before/after comm.gather ? 在comm.gather之前/之后，我还有其他事情comm.gather吗？

Here is my submission script: 这是我的提交脚本：

#!/bin/bash -l

#$ -S /bin/bash
#$ -l h_rt=00:05:00
#$ -l mem=2G
#$ -l tmpfs=10G
#$ -pe mpi 5
#$ -N stack_test
#$ -notify
#$ -wd /home/user/Scratch/

module load gcc-libs
module load python3/recommended
module unload compilers mpi
module load compilers/gnu/4.9.2
module load mpi/openmpi/3.1.1/gnu-4.9.2
module load mpi4py
module list

python_infile=test.py

echo ""
echo "Running python < $python_infile ..."
echo ""
gerun python $python_infile

I submit this script with simply as qsub js_test.sh 我使用qsub js_test.sh提交此脚本

The returned .o file for this fake example shows that, in this case 4/5 mpis contain type None information: Would it be in this case, that if I then saved a_gath and b_gath to disk, it would save the last mpi? 对于该假示例返回的.o文件显示，在这种情况下4/5 mpis包含类型None信息：在这种情况下，如果我随后将a_gath和b_gath保存到磁盘，它将保存最后的mpi吗？ which is type None ? 哪个是None类型？ I would expect that after using comm.gather , I would have a single array of size MxN for variables a_gath and b_gath 我希望在使用comm.gather ，对于变量a_gath和b_gath我将有一个大小为MxN数组

()
---
()
()
---
()
()
---
()
(5, 400, 300)
---
(5, 400, 300)
()
---
()

Many thanks. 非常感谢。

Answer 1

To clarify the answer that you've found for yourself in the comments: MPI_Gather is a rooted operation : its results are not identical across all ranks, and specifically differ on the rank provided in the root argument. 为了澄清您在注释中为自己找到的答案： MPI_Gather是一个根操作 ：其结果在所有等级上都不相同，特别是在root参数中提供的等级上有所不同。

In the case of Gather , your finding that rank 0 is the one that ends up with the data is exactly correct for the way you've called it (with root=0 ). 在Gather的情况下，您发现排名0最终以数据结尾的排名与您调用它的方式（ root=0 ）完全正确。

While in principle MPI supports multiple program, multiple data execution in which different ranks are running different code, in practice most MPI code is written in a single program, multiple data style, like what you've written. 虽然原则上MPI支持多个程序，多个数据执行，其中不同的等级运行不同的代码，但实际上，大多数MPI代码是用单个程序编写的，具有多种数据样式，就像您编写的一样。 Because all ranks are running the same body of code, it's up to you to check after returning from a rooted operation like MPI_Gather whether the rank that you're running on is the root and execute different code paths accordingly. 由于所有等级都在运行相同的代码体，因此由您决定从诸如MPI_Gather之类的根操作返回后，是要检查您正在运行的等级是否是根，并相应地执行不同的代码路径。 If you don't, then every rank is going to be executing these lines: 如果您不这样做，那么每个级别都将执行以下行：

print(np.shape(a_gath))
print('---') 
print(np.shape(b_gath))

which, as you've noted, does not print the results you expected for a_gath and b_gath except on rank 0. 正如您已经指出的那样，除了等级0之外，它不会输出您预期的a_gath和b_gath结果。

Try the following: 请尝试以下操作：

a_gath = comm.gather(result_a, root=0)
b_gath = comm.gather(result_b, root=0)

if rank == 0:
    print(np.shape(a_gath))
    print('---')
    print(np.shape(b_gath))

HPC上的mpi4py：comm.gather

问题描述

1 个解决方案

解决方案1
0 2019-08-05 16:47:33

HPC上的mpi4py：comm.gather

问题描述

1 个解决方案

解决方案1 0 2019-08-05 16:47:33

解决方案1
0 2019-08-05 16:47:33