简体   繁体   English

我可以将MPI与共享内存一起使用吗

[英]Can I use MPI with shared memory

I have written a simulation software for highly parallelized execution, using MPI for internode and threads for intranode parallelization to reduce the memory footprint by using shared memory where possible. 我已经编写了用于高度并行执行的仿真软件,使用MPI进行节点间的使用,使用线程进行节点内的并行化,以尽可能使用共享内存来减少内存占用。 (The largest data structures are mostly read-only, so I can easily manage thread-safety.) (最大的数据结构大多是只读的,因此我可以轻松管理线程安全。)

Although my program works fine (finally), I am having second thoughts about whether this approach is really best, mostly because managing two types of parallelizations does require some messy asynchronous code here and there. 尽管我的程序运行良好(最终),但我仍在思考这种方法是否真的是最好的,主要是因为管理两种类型的并行化确实需要在各处乱码。

I found a paper ( pdf draft ) introducing a shared memory extension to MPI, allowing the use of shared data structures within MPI parallelization on a single node. 我发现了一篇论文pdf草稿 ),它向MPI引入了共享内存扩展,从而允许在单个节点上的MPI并行化中使用共享数据结构。

I am not very experienced with MPI, so my question is: Is this possible with recent standard Open MPI implementations and where can I find an introduction / tutorial on how to do it? 我对MPI的经验不是很丰富,所以我的问题是:最新的标准Open MPI实现是否有可能?在哪里可以找到有关如何做的介绍/教程?

Note that I am not talking about how message passing is accomplished with shared memory, I know that MPI does that. 请注意,我不是在谈论如何通过共享内存来完成消息传递,我知道MPI会做到这一点。 I would like to (read-)access the same object in memory from multiple MPI processors. 我想从多个MPI处理器(读取)访问内存中的同一对象。

This can be done - here is a test code that sets up a small table on each shared memory node. 可以完成-这是一个测试代码,可以在每个共享内存节点上建立一个小表。 Only one process (node rank 0) actually allocates and initialises the table, but all processes on a node can read it (apologies for the formatting - seems to be a space/tab issue) 实际上只有一个进程(节点等级0)分配并初始化该表,但是节点上的所有进程都可以读取该表(格式化的道歉-似乎是空格/制表符问题)

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(void)
{
  int i, flag;

  int nodesize, noderank;
  int size, rank, irank;
  int tablesize, localtablesize;
  int *table, *localtable;
  int *model;

  MPI_Comm allcomm, nodecomm;

  char verstring[MPI_MAX_LIBRARY_VERSION_STRING];
  char nodename[MPI_MAX_PROCESSOR_NAME];

  MPI_Aint winsize;
  int windisp;
  int *winptr;

  int version, subversion, verstringlen, nodestringlen;

  allcomm = MPI_COMM_WORLD;

  MPI_Win wintable;

  tablesize = 5;

  MPI_Init(NULL, NULL);

  MPI_Comm_size(allcomm, &size);
  MPI_Comm_rank(allcomm, &rank);

  MPI_Get_processor_name(nodename, &nodestringlen);

  MPI_Get_version(&version, &subversion);
  MPI_Get_library_version(verstring, &verstringlen);

  if (rank == 0)
    {
      printf("Version %d, subversion %d\n", version, subversion);
      printf("Library <%s>\n", verstring);
    }

  // Create node-local communicator

  MPI_Comm_split_type(allcomm, MPI_COMM_TYPE_SHARED, rank,
              MPI_INFO_NULL, &nodecomm);

  MPI_Comm_size(nodecomm, &nodesize);
  MPI_Comm_rank(nodecomm, &noderank);

  // Only rank 0 on a node actually allocates memory

  localtablesize = 0;

  if (noderank == 0) localtablesize = tablesize;

  // debug info

  printf("Rank %d of %d, rank %d of %d in node <%s>, localtablesize %d\n",
     rank, size, noderank, nodesize, nodename, localtablesize);


  MPI_Win_allocate_shared(localtablesize*sizeof(int), sizeof(int),
              MPI_INFO_NULL, nodecomm, &localtable, &wintable);

  MPI_Win_get_attr(wintable, MPI_WIN_MODEL, &model, &flag);

  if (1 != flag)
    {
      printf("Attribute MPI_WIN_MODEL not defined\n");
    }
  else
    {
      if (MPI_WIN_UNIFIED == *model)
    {
      if (rank == 0) printf("Memory model is MPI_WIN_UNIFIED\n");
    }
      else
    {
      if (rank == 0) printf("Memory model is *not* MPI_WIN_UNIFIED\n");

      MPI_Finalize();
      return 1;
    }
    }

  // need to get local pointer valid for table on rank 0

  table = localtable;

  if (noderank != 0)
    {
      MPI_Win_shared_query(wintable, 0, &winsize, &windisp, &table);
    }

  // All table pointers should now point to copy on noderank 0

  // Initialise table on rank 0 with appropriate synchronisation

  MPI_Win_fence(0, wintable);

  if (noderank == 0)
    {
      for (i=0; i < tablesize; i++)
    {
      table[i] = rank*tablesize + i;
    }
    }

  MPI_Win_fence(0, wintable);

  // Check we did it right

  for (i=0; i < tablesize; i++)
    {
      printf("rank %d, noderank %d, table[%d] = %d\n",
         rank, noderank, i, table[i]);
    }

  MPI_Finalize();
}

Here is some sample output for 6 processes across two nodes: 这是跨两个节点的6个进程的一些示例输出:

Version 3, subversion 1
Library <SGI MPT 2.14  04/05/16 03:53:22>
Rank 3 of 6, rank 0 of 3 in node <r1i0n1>, localtablesize 5
Rank 4 of 6, rank 1 of 3 in node <r1i0n1>, localtablesize 0
Rank 5 of 6, rank 2 of 3 in node <r1i0n1>, localtablesize 0
Rank 0 of 6, rank 0 of 3 in node <r1i0n0>, localtablesize 5
Rank 1 of 6, rank 1 of 3 in node <r1i0n0>, localtablesize 0
Rank 2 of 6, rank 2 of 3 in node <r1i0n0>, localtablesize 0
Memory model is MPI_WIN_UNIFIED
rank 3, noderank 0, table[0] = 15
rank 3, noderank 0, table[1] = 16
rank 3, noderank 0, table[2] = 17
rank 3, noderank 0, table[3] = 18
rank 3, noderank 0, table[4] = 19
rank 4, noderank 1, table[0] = 15
rank 4, noderank 1, table[1] = 16
rank 4, noderank 1, table[2] = 17
rank 4, noderank 1, table[3] = 18
rank 4, noderank 1, table[4] = 19
rank 5, noderank 2, table[0] = 15
rank 5, noderank 2, table[1] = 16
rank 5, noderank 2, table[2] = 17
rank 5, noderank 2, table[3] = 18
rank 5, noderank 2, table[4] = 19
rank 0, noderank 0, table[0] = 0
rank 0, noderank 0, table[1] = 1
rank 0, noderank 0, table[2] = 2
rank 0, noderank 0, table[3] = 3
rank 0, noderank 0, table[4] = 4
rank 1, noderank 1, table[0] = 0
rank 1, noderank 1, table[1] = 1
rank 1, noderank 1, table[2] = 2
rank 1, noderank 1, table[3] = 3
rank 1, noderank 1, table[4] = 4
rank 2, noderank 2, table[0] = 0
rank 2, noderank 2, table[1] = 1
rank 2, noderank 2, table[2] = 2
rank 2, noderank 2, table[3] = 3
rank 2, noderank 2, table[4] = 4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM