简体   繁体   English

用于在MPI中的3D过程分解中交换2D晕圈的子阵列数据类型的数量

[英]Number of subarray data types for exchanging 2D halos in 3D process decomposition in MPI

Assume a global cube of dimensions GX*GY*GZ which is decomposed using 3D Cartesian Topology into 3D cubes of sizes PX*PY*PZ on each process. 假设尺寸为GX*GY*GZ的全局立方体在每个过程中使用3D笛卡尔拓扑分解为尺寸为PX*PY*PZ 3D立方体。 Adding Halos for exchange of data this becomes (PX+2)*(PY+2)*(PZ+2) . 添加Halos用于交换数据,这变为(PX+2)*(PY+2)*(PZ+2) Assuming we use the Subarray data type for 2D halo exchange - do we need to define 12 subarray types ? 假设我们使用Subarray数据类型进行2D光晕交换 - 我们是否需要定义12个子阵列类型?

My reasoning is this: For YZ plane we create one Subarray type for sending and one subarray type for receiving as the starting coordinates are to be specified WITHIN the Subarray data type itself. 我的理由是:对于YZ平面,我们创建一个用于发送的子阵列类型和一个用于接收的子阵列类型,因为起始坐标将在子阵列数据类型本身中指定。 But there are 2 YZ planes, which results in 4 Subarray datatypes. 但是有2 YZ平面,这导致4 Subarray数据类型。 Though the global and local data sizes remain the same but due to the starting indexes - we need to define 4 distinct Subarray types. 虽然全局和本地数据大小保持不变但由于起始索引 - 我们需要定义4不同的子阵列类型。 Isn't it better to send four of these planes using a Vector data type and the remaining two using a Subarray data type ? 使用Vector数据类型发送其中四个平面是不是更好,其余两个使用Subarray数据类型?

You have three pattens of data access here - sending/receiving an X-face of the subdomain, a Y-face, and a Z-face - so you need three different ways of describing those patterns. 这里有三个数据访问模式 - 发送/接收子域的X面,Y面和Z面 - 因此您需要三种不同的方式来描述这些模式。 Which and how many types you use to describe that is largely dependant on what you find the clearest way of expressing and using those patterns. 您使用哪种类型和多少类型来描述,这在很大程度上取决于您找到表达和使用这些模式的最清晰方式。

Let's say you have, locally, PX=8, PY=5, PZ=7, so that including the halo, the local subdomains are 10x7x9. 假设您在本地具有PX = 8,PY = 5,PZ = 7,因此包括晕,本地子域为10x7x9。 This is in C, so we'll assume the data is stored in some contiguous array arr[ix][iy][iz] , so that going values (ix,iy,1) and (ix,iy,2) are contiguous (offset by one item size - say 8 bytes for doubles), values (ix,1,iz) and (ix,2,iz) are offset by (PZ+2) [that is, 9] values, and (1,iy,iz) and (2,iy,iz) are offset by (PY+2)*(PZ+2) [ = 7*9 = 63 ] values. 这是在C中,所以我们假设数据存储在一些连续的数组arr[ix][iy][iz] ,因此前进的值(ix,iy,1)和(ix,iy,2)是连续的(偏移一个项目大小 - 比如8个字节表示双精度),值(ix,1,iz)和(ix,2,iz)偏移(PZ + 2)[即9]值,(1, iy,iz)和(2,iy,iz)被(PY + 2)*(PZ + 2)[= 7 * 9 = 63]值偏移。

So let's see how this plays out, sketching out faces of the grid with z/y being left/right and up/down, and x shown in neighbouring panels. 因此,让我们看看它是如何发挥作用的,勾画出网格的面,z / y是左/右和上/下,x显示在相邻的面板中。 For simplicity we'll include corner cells in what we send/receive. 为简单起见,我们将在发送/接收的内容中包含角单元。

The data you'd need to send an y-face to the upper neighbour looks like: 您需要向上邻居发送y-face所需的数据如下所示:

       x = 0          x = 1     ...      x = 9        Local Grid Size:
    +---------+    +---------+        +---------+     PX = 8
6   |         |    |         |        |         |     PY = 5
5   |@@@@@@@@@|    |@@@@@@@@@|        |@@@@@@@@@|     PZ = 7
4  ^|         |   ^|         |       ^|         |
3  ||         |   ||         |       ||         |
2  y|         |   y|         |       y|         |
1   |         |    |         |        |         |
0   |         |    |         |        |         |
    +---------+    +---------+        +---------+
     012345678      012345678   ...    012345678
        z->            z->                z->

That is, it would start at [0][PY][0] (eg, [0][5][0]) and extend to [PX+1][PY][PZ+1]. 也就是说,它将从[0] [PY] [0]开始(例如,[0] [5] [0])并延伸到[PX + 1] [PY] [PZ + 1]。 So you'd start at [0][PY][0]...[0][PY][PZ+1], which are PZ+2 contiguous values, and then go to [1][PY][0] - which is a jump of (PY+2)*(PZ+2) values from [0][PY][0], the start earlier, and take another PZ+2 contiguous values, and so on. 所以你从[0] [PY] [0] ... [0] [PY] [PZ + 1]开始,它们是PZ + 2个连续值,然后转到[1] [PY] [0 ] - 从[0] [PY] [0]跳转(PY + 2)*(PZ + 2)值,开始更早,并取另一个PZ + 2连续值,依此类推。 You could express this simply as: 你可以简单地表达为:

  • MPI_Type_vector of count PX+2, blocklen (PZ+2), and stride of (PY+2)*(PZ+2), or 计数PX + 2的MPI_Type_vector,blocklen(PZ + 2)和(PY + 2)*(PZ + 2)的步幅,或者
  • MPI_Type_subarray, with a slice subsize of [PX+2,1,PZ+2], starting at [0,PY,0] MPI_Type_subarray,切片子范围为[PX + 2,1,PZ + 2],从[0,PY,0]开始

They are exactly equivalent, and there is no performance difference. 它们完全相同,并没有性能差异。

Now, let's consider receiving this data: 现在,让我们考虑接收这些数据:

       x = 0          x = 1     ...      x = 9        Local Grid Size:
    +---------+    +---------+        +---------+     PX = 8
6   |         |    |         |        |         |     PY = 5
5   |         |    |         |        |         |     PZ = 7
4  ^|         |   ^|         |       ^|         |
3  ||         |   ||         |       ||         |
2  y|         |   y|         |       y|         |
1   |         |    |         |        |         |
0   |@@@@@@@@@|    |@@@@@@@@@|        |@@@@@@@@@|
    +---------+    +---------+        +---------+
     012345678      012345678   ...    012345678
        z->            z->                z->

Crucially, the data pattern needed is exactly the same: PZ+2 values, then skip (PY+2)*(PZ+2) values from the start of that last block, and another PZ+2 values. 至关重要的是,所需的数据模式完全相同:PZ + 2值,然后从最后一个块的开头跳过(PY + 2)*(PZ + 2)值,以及另一个PZ + 2值。 We could describe it as: 我们可以将其描述为:

  • MPI_Type_vector of count PX+2, blocklen (PZ+2), and stride of (PY+2)*(PZ+2), or 计数PX + 2的MPI_Type_vector,blocklen(PZ + 2)和(PY + 2)*(PZ + 2)的步幅,或者
  • MPI_Type_subarray, with a slice subsize of [PX+2,1,PZ+2], starting at [0,0,0] MPI_Type_subarray,切片子范围为[PX + 2,1,PZ + 2],从[0,0,0]开始

The only difference is the starting position of the subarray for the subarray type. 唯一的区别是子阵列类型的子阵列的起始位置。 But this isn't as big a difference as it seems! 但这并不像看起来那么大!

When you actually use the subarray type in a send or receive (say), you pass the routine a pointer to some data, and then give it a subarray type with some starting position and slice description. 当您在发送或接收(例如)中实际使用子数组类型时,将例程指针传递给某些数据,然后为其提供具有一些起始位置和切片描述的子数组类型。 MPI then skips ahead to that starting position, and uses the data layout described by that slice. 然后MPI跳到该起始位置,并使用该切片描述的数据布局。

So while it is perfectly fine to define and use four subarray types: 因此,虽然定义和使用四种子阵列类型非常好:

MPI_Type_create_subarray(ndims=3, sizes=[PX+2,PY+2,PZ+2], subsizes=[PX+2,1,PZ+2], 
                         starts=[0,0,0],... &recv_down_yface_t);
MPI_Type_create_subarray(...all the same...
                         starts=[0,1,0],... &send_down_yface_t);
MPI_Type_create_subarray(...all the same...
                         starts=[0,PY,0],... &send_up_yface_t);
MPI_Type_create_subarray(...all the same...
                         starts=[0,PY+1,0],... &recv_up_yface_t);

/* Send lower yface */
MPI_Send(&(arr[0][0][0]), 1, send_down_yface_t, ... );
/* Send upper yface */
MPI_Send(&(arr[0][0][0]), 1, send_up_yface_t, ... );
/* Receive lower face */
MPI_Recv(&(arr[0][0][0]), 1, recv_down_yface_t, ... );
/* Receive upper face */
MPI_Recv(&(arr[0][0][0]), 1, recv_up_yface_t, ... );

which declare four equivalent patterns with different starting points, you can also just define one, and use it pointing at different starting points for the data you need: 它声明了四个具有不同起点的等效模式,您也可以只定义一个,并使用它指向您需要的数据的不同起点:

MPI_Type_create_subarray(ndims=3, sizes=[PX+2,PY+2,PZ+2], subsizes=[PX+2,1,PZ+2], 
                             starts=[0,0,0],... &yface_t);
/* ... */
/* Send lower yface */
MPI_Send(&(arr[0][1][0]), 1, yface_t, ... );
/* Send upper yface */
MPI_Send(&(arr[0][PY][0]), 1, yface_t, ... );
/* Receive lower face */
MPI_Recv(&(arr[0][0][0]), 1, yface_t, ... );
/* Receive upper face */
MPI_Recv(&(arr[0][PY+1][0]), 1, yface_t, ... );

The above is exactly the way you'd use the corresponding vector type - by pointing it at the first item to send/receive. 以上就是您使用相应矢量类型的方式 - 将其指向要发送/接收的第一个项目。

If you choose to use a subarray type, either way of using it is perfectly fine, and you'll see both choices made in various pieces of software. 如果您选择使用子阵列类型,那么使用它的任何一种方式都非常好,您将看到在各种软件中做出的两种选择。 It's just a matter of which you fine clearer - 4 types per pattern (depending on the offset), or using the offset explicitly in the send/receive. 这只是你更清楚的问题 - 每种模式有4种类型(取决于偏移),或者在发送/接收中明确使用偏移。 I personally find the 1-type approach much clearer, but there's no unambiguous right answer to be found for that question. 我个人认为1类方法更加清晰,但对于这个问题没有明确的正确答案。

As to whether to use MPI_Subarray or Vector (say), it's easiest to look at the other two patterns you'll need to support: With an X face (here you have a couple more options, since they're contiguous: 至于是否使用MPI_Subarray或Vector(比如说),最简单的方法是查看你需要支持的其他两种模式:使用X面(这里有更多选项,因为它们是连续的:

  • (PY+2)*(PZ+2) MPI_Doubles (PY + 2)*(PZ + 2)MPI_Doubles
  • 1 MPI_Type_Contiguous of (PY+2)*(PZ+2) MPI_Doubles 1 MPI_Type_Contiguous(PY + 2)*(PZ + 2)MPI_Doubles
  • MPI_Type_vector of count 1, blocklen (PY+2)*(PZ+2), and stride of anything, or of count PY+2, blocklen PZ+2, and stride of PZ+2, or any equivalent combination 计数1的MPI_Type_vector,blocklen(PY + 2)*(PZ + 2),任何步幅,或计数PY + 2,blocklen PZ + 2,PZ + 2的步幅,或任何等效组合
  • a subarray, with a slice subsize of [1,PY+2,PZ+2], starting at an appropriate location 从适当的位置开始,具有[1,PY + 2,PZ + 2]的切片子集的子阵列

and for z-faces: 对于z-faces:

  • MPI_Type_vector of count (PX+2)*(PY+2), blocklen 1, and stride of PZ+2 计数的MPI_Type_vector(PX + 2)*(PY + 2),blocklen 1和PZ + 2的步幅
  • a subarray, with a slice subsize of [PX+2,PY+2,1], starting at an appropriate location. 从适当的位置开始,具有[PX + 2,PY + 2,1]的切片子集的子阵列。

So again, it all comes down to clarity. 所以,这一切都归结为清晰。 The subarray type looks the most similar between all directions, and the difference is fairly clear; 子阵列类型在所有方向之间看起来最相似,差异相当明显; whereas if I showed you a bunch of vector types all declared in the same piece of code, you'd have to do some sketching on a whiteboard to make sure I hadn't accidentally switched them around. 而如果我向你展示了一堆所有在同一段代码中声明的矢量类型,你必须在白板上做一些素描,以确保我没有意外地切换它们。 The subarray also generalizes the most easily - if you move to a method that now needs 2 halo cells on each side, say, or to not send the corner cells, the modification to the subarray is trivial, whereas you have to do some work to build something with vectors. 子阵列也最容易推广 - 如果你转向一个现在每边需要2个光环单元的方法,或者不发送角单元格,对子阵列的修改是微不足道的,而你必须做一些工作用向量构建东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM