在 mpi 中处理计算和通信的过程

Question

P1 P2 P1 P2
P3 P4 P3 P4

1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8
1 2 3 4 1 2 3 4
0 6 0 8 0 6 0 8

Suppose P1,P2,P3,P4 are processess and P1 has data points 1 2 5 6, P2 has data points 3 4 7 8 P3 has data points 1 2 0 6, P4 has datapoints 3 4 0 8. I want to peform stecil computation on this piece of data such that new value of 6 will be averageof(2,5,7,2).假设 P1,P2,P3,P4 是进程，P1 有数据点 1 2 5 6，P2 有数据点 3 4 7 8 P3 有数据点 1 2 0 6，P4 有数据点 3 4 0 8。我想执行 stecil对这条数据进行计算，使得新值 6 将是 averageof(2,5,7,2)。 However 7 is the data point of P2 and 2 is the data point of P3.但是 7 是 P2 的数据点，2 是 P3 的数据点。 How to solve this?如何解决这个问题？ I am able to run this for single process but how to approach this by normal MPI_Send and MPI_Recv?我可以为单个进程运行它，但是如何通过正常的 MPI_Send 和 MPI_Recv 来解决这个问题？ also using type_contiguous or type_vector and collective communication?还使用 type_contiguous 或 type_vector 和集体通信？

Any help will be much appreciated.Thank you.任何帮助将不胜感激。谢谢。

Answer 1

There is a well-known solution to this problem that goes under the name of ghost cells or halos .这个问题有一个众所周知的解决方案，称为幽灵细胞或光晕。 The idea is to surround each sub-array with one or more additional layers of cells depending on the stencil.这个想法是根据模板用一层或多层额外的单元围绕每个子阵列。 At the beginning of each iteration, each process syncs the state of the halo by exchanging data with its nearest neighbours, which operation is called halo swap .在每次迭代开始时，每个进程通过与其最近的邻居交换数据来同步光环的 state，该操作称为光环交换。 Halos provide the necessary data to compute the new state of all the inner cells, but they lack the necessary data for their own update, therefore the content of those cells gets "old" after one iteration and that's why they are sometimes called "ghosts". Halo 提供了计算所有内部单元格的新 state 的必要数据，但它们缺乏自己更新所需的数据，因此这些单元格的内容在一次迭代后变得“旧”，这就是为什么它们有时被称为“幽灵” .

This blog post presents the idea clearly, but the language used is Julia.这篇博文清楚地表达了这个想法，但使用的语言是 Julia。 Since the About page says it is fine to reuse the content elsewhere, I've borrowed shamelessly the following illustration:由于 About 页面说可以在其他地方重用内容，所以我无耻地借用了以下插图：

The figure shows halos only along the borders between the processes, but in practice, it is easier if you have halos on all four sides and only update the ones that are needed because it makes the code more symmetric and reduces its complexity.该图仅显示进程之间的边界上的光晕，但实际上，如果您在所有四个侧面都有光晕并且只更新需要的光晕，因为它使代码更加对称并降低了其复杂性，这会更容易。

The decomposition in your particular case goes like this:您的特定情况下的分解如下：

first becomes首先变成

1 2 | 3 4
5 6 | 7 8
----+----
1 2 | 3 4
0 6 | 0 8

Add halos:添加光环：

x x x x | x x x x
x 1 2 x | x 3 4 x
x 5 6 x | x 7 8 x
x x x x | x x x x
--------+--------
x x x x | x x x x
x 1 2 x | x 3 4 x
x 0 6 x | x 0 8 x
x x x x | x x x x

Now, before the stencil operation, you need to perform the halo swap.现在，在模板操作之前，您需要执行光环交换。 It goes in steps as shown below.它按如下所示的步骤进行。 Swaps are performed in pairs along each dimension.交换沿每个维度成对执行。 The order doesn't really matter.顺序并不重要。

1.1) Horizontal swap in the direction east 1.1) 向东横向交换

Each process sends its rightmost column to the process right of it.每个进程将其最右边的列发送到其右侧的进程。 The receiver places the data in its left halo column:接收器将数据放在它的左晕列中：

x x [x] x | [x] x x x     x x [x] x | [x] x x x
x 1 [2] x | [x] 3 4 x     x 1 [2] x | [2] 3 4 x
x 5 [6] x | [x] 7 8 x     x 5 [6] x | [6] 7 8 x
x x [x] x | [x] x x x     x x [x] x | [x] x x x
----------+---------- --> ----------+----------
x x [x] x | [x] x x x     x x [x] x | [x] x x x
x 1 [2] x | [x] 3 4 x     x 1 [2] x | [2] 3 4 x
x 0 [6] x | [x] 0 8 x     x 0 [6] x | [6] 0 8 x
x x [x] x | [x] x x x     x x [x] x | [x] x x x

1.2) Horizontal swap in the direction west 1.2) 向西横向交换

Each process sends its leftmost column to the process left of it.每个进程将其最左边的列发送到它左边的进程。 The receiver places the data in its right halo column:接收器将数据放在其右晕列中：

x x x [x] | x [x] x x     x x x [x] | x [x] x x
x 1 2 [x] | 2 [3] 4 x     x 1 2 [3] | 2 [3] 4 x
x 5 6 [x] | 6 [7] 8 x     x 5 6 [7] | 6 [7] 8 x
x x x [x] | x [x] x x     x x x [x] | x [x] x x
----------+---------- --> ----------+----------
x x x [x] | x [x] x x     x x x [x] | x [x] x x
x 1 2 [x] | 2 [3] 4 x     x 1 2 [3] | 2 [3] 4 x
x 0 6 [x] | 6 [0] 8 x     x 0 6 [0] | 6 [0] 8 x
x x x [x] | x [x] x x     x x x [x] | x [x] x x

2.1) Vertical swap in the direction south 2.1) 南向垂直交换

Each process sends its bottom row to the process below it.每个进程将其底行发送到它下面的进程。 The receiver places the data in its top halo row:接收器将数据放置在其顶部的光环行中：

[x x x x | x x x x]   [x x x x | x x x x]
 x 1 2 3 | 2 3 4 x     x 1 2 3 | 2 3 4 x
[x 5 6 7 | 6 7 8 x]   [x 5 6 7 | 6 7 8 x]
 x x x x | x x x x     x x x x | x x x x
 --------+-------- --> --------+--------
[x x x x | x x x x]   [x 5 6 7 | 6 7 8 x]
 x 1 2 3 | 2 3 4 x     x 1 2 3 | 2 3 4 x
[x 0 6 0 | 6 0 8 x]   [x 0 6 0 | 6 0 8 x]
 x x x x | x x x x     x x x x | x x x x

2.2) Vertical swap in the direction north 2.2) 向北垂直交换

Each process sends its top row to the process above it.每个进程将其顶行发送到它上面的进程。 The receiver places the data in its bottom halo row:接收器将数据放置在其底部的光环行中：

 x x x x | x x x x     x x x x | x x x x
[x 1 2 3 | 2 3 4 x]   [x 1 2 3 | 2 3 4 x]
 x 5 6 7 | 6 7 8 x     x 5 6 7 | 6 7 8 x
[x x x x | x x x x]   [x 1 2 3 | 2 3 4 x]
 --------+-------- --> --------+--------
 x 5 6 7 | 6 7 8 x     x 5 6 7 | 6 7 8 x
[x 1 2 3 | 2 3 4 x]   [x 1 2 3 | 2 3 4 x]
 x 0 6 0 | 6 0 8 x     x 0 6 0 | 6 0 8 x
[x x x x | x x x x]   [x x x x | x x x x]

Each of those operations can be implemented with a single MPI call - MPI_Sendrecv .这些操作中的每一个都可以通过单个 MPI 调用 - MPI_Sendrecv来实现。 The send part sends the local data row or column and the receive part receives it into the local halo row or column.发送部分发送本地数据行或列，接收部分将其接收到本地晕行或列中。

The final result after the four swaps is:四次交换后的最终结果是：

x x x x | x x x x
x 1 2 3 | 2 3 4 x
x 5 6 7 | 6 7 8 x
x 1 2 3 | 2 3 4 x
--------+--------
x 5 6 7 | 6 7 8 x
x 1 2 3 | 2 3 4 x
x 0 6 0 | 6 0 8 x
x x x x | x x x x

You'll probably notice that even the corner elements in each halo contain the right diagonal element that one would expect to be there.您可能会注意到，即使每个光环中的角元素也包含人们期望存在的正确对角元素。 The beauty of this is that you can simply add more steps to extend it to as many dimensions as you need and all elements automagically find their right position in the halo region.这样做的美妙之处在于，您可以简单地添加更多步骤以将其扩展到所需的任意维度，并且所有元素都会自动在光环区域中找到它们正确的 position。

You now have all four neigbhours of 6 available locally and can proceed with averaging their values.您现在可以在本地获得所有 6 的所有四个 neigbhours，并且可以继续对它们的值进行平均。 Note that you only use the values in the halos and do not update them.请注意，您只使用光环中的值，而不是更新它们。

Some additional notes:一些附加说明：

This works no matter whether you have periodic boundary conditions or not.无论您是否有周期性边界条件，这都有效。 With periodic boundary conditions P2 is the right neighbour of P1 and P1 is the right neighbour of P2 when doing the right shift.在周期性边界条件下，当进行右移时，P2 是 P1 的右邻居，P1 是 P2 的右邻居。 Also, P1 is the left neighbour of P2 and P2 is the left neighbour of P1.此外，P1 是 P2 的左邻居，P2 是 P1 的左邻居。 The same applies in vertical direction.这同样适用于垂直方向。
You do not need special code to handle processes that are on the boundary.您不需要特殊代码来处理边界上的进程。 If a process doesn't have a right (or left, or top, or bottom) neighbour, simply send or receive a message to/from MPI_PROC_NULL .如果一个进程没有右（或左、上或下）邻居，只需向/从MPI_PROC_NULL发送或接收消息。 Ie, the code goes like this:即，代码如下：
```
 int right = compute_right_rank(); int left = compute_left_rank(); int up = compute_top_rank(); int down = compute_bottom_rank(); MPI_Sendrecv(right_column, 1, columndt, right, 0, left_halo, 1, columndt, left, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Sendrecv(left_column, 1, columndt, left, 0, right_halo, 1, columndt, right, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Sendrecv(bottom_row, 1, rowdt, down, 0, top_halo, 1, rowdt, up, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Sendrecv(top_column, 1, rowdt, up, 0, bottom_halo, 1, rowdt, down, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
```
Here, compute_right_rank() should return rank + 1 if there is a rank to the right or MPI_PROC_NULL otherwise.在这里，如果右边有排名， compute_right_rank()应该返回rank + 1 ，否则返回MPI_PROC_NULL 。 Sending to MPI_PROC_NULL or receiving from it is a no-op, ie, nothing happens.发送到MPI_PROC_NULL或从它接收是无操作的，即没有任何反应。 It allows you to write code without if s.它允许您在没有if的情况下编写代码。
columndt is an MPI datatype that corresponds to a column in the array. columndt是一种 MPI 数据类型，对应于数组中的列。 You can construct it using MPI_Type_vector .您可以使用MPI_Type_vector构造它。 rowdt is an MPI datatype that represents a whole row in the array. rowdt是一种 MPI 数据类型，表示数组中的一整行。 Construct it using MPI_Type_contiguous .使用MPI_Type_contiguous构造它。
It is super easy to compute the ranks of the neighbours if the ranks are in a Cartesian communicator.如果排名在笛卡尔通信器中，那么计算邻居的排名非常容易。 MPI_Cart_create and MPI_Cart_shift are your best friends here. MPI_Cart_create和MPI_Cart_shift是你最好的朋友。 You may also use MPI neighbour collectives to reduce the number of MPI calls even further.您还可以使用 MPI 邻居集合来进一步减少 MPI 调用的数量。
The bottom halos of the bottom ranks are not filled because the boundary conditions are not periodic.由于边界条件不是周期性的，因此未填充底部等级的底部光晕。 The same is the case for the right halos of the rightmost ranks, the top halos of the top ranks, and the left halos of the leftmost ranks.最右边行列的右光晕、最上面行列的顶部光晕和最左边行列的左光晕的情况也是如此。 You may want to pre-fill them with a special value, eg, 0 .您可能希望使用特殊值预先填充它们，例如0 。 That value will never change because there is no communication that puts something there.这个价值永远不会改变，因为没有沟通可以把东西放在那里。
If your computation is iterative, you must perform the halo before each iteration.如果您的计算是迭代的，则必须在每次迭代之前执行光环。 If the halo swap is too slow compared to the computation, you may increase the thickness of the halo and make it two or more layers thick.如果与计算相比，晕轮交换太慢，您可以增加晕轮的厚度并使其厚两层或多层。 Use the outer layers to update the values in the inner layers.使用外层更新内层中的值。 A halo three layers thick requires swaps every three iterations.三层厚的晕轮需要每三次迭代交换一次。

在 mpi 中处理计算和通信的过程

问题描述

1 个解决方案

解决方案1
2 2021-02-19 09:35:30

在 mpi 中处理计算和通信的过程

问题描述

1 个解决方案

解决方案1 2 2021-02-19 09:35:30

解决方案1
2 2021-02-19 09:35:30