简体   繁体   English

基于另一个变量对一个值执行 MPI 的所有减少的有效方法?

[英]An efficient way to perform an all reduction in MPI of a value based on another variable?

As an example, lets say I have例如,假设我有

int a = ...;
int b = ...;
int c;

where a is the result of some complex local calculation and b is some metric for the quality of a .其中a是一些复杂的局部计算的结果, ba质量的一些度量。

I'd like to send the best value of a to every process and store it in c where best is defined by having the largest value of b .我想将 a 的最佳值a到每个进程并将其存储在c中,其中 best 由b的最大值定义。

I guess I'm just wondering if there is a more efficient way of doing this than doing an allgather on a and b and then searching through the resulting arrays.我想我只是想知道是否有比在ab上进行allgather然后搜索生成的 arrays 更有效的方法。

The actual code involves sending and comparing several hundred values on upto several hundred/thousand processes, so any efficiency gains would be welcome.实际代码涉及发送和比较多达数百/千个进程的数百个值,因此任何效率提升都会受到欢迎。

You can pair the value of b with the rank of the process to find the rank that contains the maximum value of b .您可以将b的值与进程的等级配对,以找到包含b最大值的等级。 The MPI_DOUBLE_INT type is very useful for this purpose. MPI_DOUBLE_INT类型对此非常有用。 You can then broadcast a from this rank in order to have the value at each process.然后,您可以从该等级广播a ,以便在每个过程中获得价值。

#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    int my_rank;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

    // Create random a and b on each rank.
    srand(123 + my_rank);
    double a = rand() / (double)RAND_MAX;
    double b = rand() / (double)RAND_MAX;

    struct
    {
        double value;
        int rank;
    } s_in, s_out;

    s_in.value = b;
    s_in.rank = my_rank;

    printf("before: %d, %f, %f\n", my_rank, a, b);

    // Find the maximum value of b and the corresponding rank.
    MPI_Allreduce(&s_in, &s_out, 1, MPI_DOUBLE_INT, MPI_MAXLOC, MPI_COMM_WORLD);
    b = s_out.value;

    // Broadcast from the rank with the maximum value.
    MPI_Bcast(&a, 1, MPI_DOUBLE, s_out.rank, MPI_COMM_WORLD);

    printf("after: %d, %f, %f\n", my_rank, a, b);

    MPI_Finalize();
}

I guess I'm just wondering if there is a more efficient way of doing this than doing an allgather on a and b and then searching through the resulting arrays.我想我只是想知道是否有比在 a 和 b 上进行 allgather 然后搜索生成的 arrays 更有效的方法。

This can be achieved with only a single MPI_AllReduce .这可以通过一个单一的MPI_AllReduce来实现。

I will present two approaches, a simpler one (suitable for your use case);我将介绍两种方法,一种更简单(适合您的用例); and a more generic one, for more complex use-cases.还有一个更通用的,用于更复杂的用例。 The latter will also be useful to show case MPI functionality such as custom MPI Datatypes and custom MPI reduction operators.后者也可用于展示案例 MPI 功能,例如自定义 MPI 数据类型和自定义 MPI 缩减运算符。

Approach 1方法一

To represent代表

int a = ...;
int b = ...;

you could use the following struct:您可以使用以下结构:

typedef struct MyStruct {
    int b;
    int a;
} S;

then you can use the MPI Datatype MPI_2INT and the MPI operator MAXLOC那么您可以使用 MPI 数据类型MPI_2INT和 MPI 运算符MAXLOC

The operator MPI_MINLOC is used to compute a global minimum and also an index attached to the minimum value.运算符 MPI_MINLOC 用于计算全局最小值以及附加到最小值的索引。 **MPI_MAXLOC similarly computes a global maximum and index. **MPI_MAXLOC 类似地计算全局最大值和索引。 One application of these is to compute a global minimum (maximum) and the rank of the process containing this value.其中一个应用是计算全局最小值(最大值)和包含该值的进程的等级。

In your case, instead of the rank we will be using the value of 'a'.在您的情况下,我们将使用“a”的值而不是rank Hence, the MPI_AllReduce call:因此, MPI_AllReduce调用:

 S  local, global;
 ...
 MPI_Allreduce(&local, &global, 1, MPI_2INT, MPI_MAXLOC, MPI_COMM_WORLD);

The complete code would look like the following:完整的代码如下所示:

#include <stdio.h>
#include <mpi.h>

typedef struct MyStruct {
    int b;
    int a;
} S;


int main(int argc,char *argv[]){
    MPI_Init(NULL,NULL); // Initialize the MPI environment
    int world_rank; 
    int world_size;
    MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
    MPI_Comm_size(MPI_COMM_WORLD,&world_size);
    
    // Some fake data
    S local, global;
    local.a = world_rank;
    local.b = world_size - world_rank;

    MPI_Allreduce(&local, &global, 1, MPI_2INT, MPI_MAXLOC, MPI_COMM_WORLD);
          
    if(world_rank == 0){
      printf("%d %d\n", global.b, global.a);
    }

    MPI_Finalize();
    return 0;
 }

Second Approach第二种方法

The MPI_MAXLOC only works for a certain number of predefined datatypes . MPI_MAXLOC仅适用于一定数量的预定义数据类型 Nonetheless, for the remaining cases you can use the following approach (based on this SO thread ) :尽管如此,对于其余情况,您可以使用以下方法(基于此SO thread

  1. Create a struct that will contain the values a and b ;创建一个包含值abstruct
  2. Create a customize MPI_Datatype representing the 1. struct to be sent across processes;创建一个自定义的MPI_Datatype表示 1. 要跨进程发送struct
  3. Use MPI_AllReduce :使用MPI_AllReduce

int MPI_Allreduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm) int MPI_Allreduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)

Combines values from all processes and distributes the result back to all processes合并来自所有进程的值并将结果分发回所有进程

  1. Use the operation MAX ;使用操作MAX

I'd like to send the best value of 'a' to every process and store it in 'c' where best is defined by having the largest value of 'b'.我想将“a”的最佳值发送到每个进程并将其存储在“c”中,其中最佳定义为“b”的最大值。

  1. Then you have to tell MPI to only consider the element b of the struct.然后你必须告诉MPI 只考虑结构的元素b Hence, you need to create a custom MPI_Op max operation.因此,您需要创建自定义MPI_Op最大操作。

Coding the approach编码方法

So let us break step-by-step the aforementioned implementation:因此,让我们逐步打破上述实现:

First define the struct :首先定义struct

typedef struct MyStruct {
    double a, b;
} S;

Second create the customize MPI_Datatype :其次创建自定义MPI_Datatype

void defineStruct(MPI_Datatype *tstype) {
    const int count = 2;
    int          blocklens[count];
    MPI_Datatype types[count];
    MPI_Aint     disps[count];

    for (int i=0; i < count; i++){
        types[i] = MPI_DOUBLE;
        blocklens[i] = 1;
    }
    disps[0] = offsetof(S,a);
    disps[1] = offsetof(S,b);

    MPI_Type_create_struct(count, blocklens, disps, types, tstype);
    MPI_Type_commit(tstype);
}

Very Important Note that since we are using a struct you have to be careful with the fact that ( source )非常重要的注意,由于我们使用的是struct ,因此您必须小心 ( source )

the C standard allows arbitrary padding between the fields. C 标准允许在字段之间进行任意填充。

So reducing a struct with two double s is NOT the same as reducing an array with two double s.所以用两个double减少一个struct与用两个double减少一个数组是一样的。

In the main you have to do:主要你必须做:

MPI_Datatype structtype;
defineStruct(&structtype);

Third create the custom max operation:第三创建自定义最大操作:

void max_struct(void *in, void *inout, int *len, MPI_Datatype *type){
    S *invals    = in;
    S *inoutvals = inout;
    for (int i=0; i < *len; i++)
        inoutvals[i].b  = (inoutvals[i].b > invals[i].b) ? inoutvals[i].b  : invals[i].b;
}

in the main do:主要做:

MPI_Op       maxstruct;
MPI_Op_create(max_struct, 1, &maxstruct);

Finally, call the MPI_AllReduce :最后,调用MPI_AllReduce

S local, global;
...
MPI_Allreduce(&local, &global, 1, structtype, maxstruct, MPI_COMM_WORLD); 

The entire code put together:整个代码放在一起:

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

typedef struct MyStruct {
    double a, b;
} S;

void max_struct(void *in, void *inout, int *len, MPI_Datatype *type){
    S *invals    = in;
    S *inoutvals = inout;
    for (int i=0; i<*len; i++)
        inoutvals[i].b  = (inoutvals[i].b > invals[i].b) ? inoutvals[i].b  : invals[i].b;
}

void defineStruct(MPI_Datatype *tstype) {
    const int count = 2;
    int          blocklens[count];
    MPI_Datatype types[count];
    MPI_Aint     disps[count];

    for (int i=0; i < count; i++) {
        types[i] = MPI_DOUBLE;
        blocklens[i] = 1;
    }
    disps[0] = offsetof(S,a);
    disps[1] = offsetof(S,b);

    MPI_Type_create_struct(count, blocklens, disps, types, tstype);
    MPI_Type_commit(tstype);
}

int main(int argc,char *argv[]){
    MPI_Init(NULL,NULL); // Initialize the MPI environment
    int world_rank; 
    int world_size;
    MPI_Comm_rank(MPI_COMM_WORLD,&world_rank);
    MPI_Comm_size(MPI_COMM_WORLD,&world_size);
    MPI_Datatype structtype;
    MPI_Op       maxstruct;
    S  local, global;

    defineStruct(&structtype);
    MPI_Op_create(max_struct, 1, &maxstruct);

    // Just some random values
    local.a = world_rank;
    local.b = world_size - world_rank;

    MPI_Allreduce(&local, &global, 1, structtype, maxstruct, MPI_COMM_WORLD);  
          
    if(world_rank == 0){
      double c = global.a;
      printf("%f %f\n", global.b, c);
    }

    MPI_Finalize();
    return 0;
 }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM