簡體   English   中英

在通信器MPI_COMM_WORLD MPI_ERR_RANK上的MPI_Send中發生錯誤:等級無效

[英]Error occurred in MPI_Send on communicator MPI_COMM_WORLD MPI_ERR_RANK:invalid rank

我正在嘗試學習MPI。 當我從一個處理器向另一個處理器發送數據時,我能夠成功發送數據並在另一個變量中通過另一個接收它。 但是,當我嘗試在兩個處理器上發送和接收時,我收到無效的排名錯誤。

這是我的程序代碼

#include <mpi.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv) {
  int world_size;
  int rank;
  char hostname[256];
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  int tag = 4;
  int value = 4;
  int master = 0;
  int rec;
  MPI_Status status;
  // Initialize the MPI environment
  MPI_Init(&argc,&argv);

  // get the total number of processes
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);

  // get the rank of current process
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  // get the name of the processor
  MPI_Get_processor_name(processor_name, &name_len);

  // get the hostname
  gethostname(hostname,255);
  printf("World size is %d\n",world_size);

  if(rank == master){
        MPI_Send(&value,1,MPI_INT,1,tag,MPI_COMM_WORLD);
        MPI_Recv(&rec,1,MPI_INT,1,tag,MPI_COMM_WORLD,&status);
        printf("In master with value %d\n",rec);
  }
  if(rank == 1){
        MPI_Send(&tag,1,MPI_INT,0,tag,MPI_COMM_WORLD);
        MPI_Recv(&rec,1,MPI_INT,0,tag,MPI_COMM_WORLD,&status);
        printf("in slave with rank %d and value %d\n",rank, rec);
  }
  printf("Hello world!  I am process number: %d from processor %s on host %s out of %d processors\n", rank, processor_name, hostname, world_size);

  MPI_Finalize();

  return 0;
}

這是我的PBS文件:

#!/bin/bash
#PBS -l nodes=1:ppn=8,walltime=1:00
#PBS -N MPIsample
#PBS -q edu_shared

#PBS -m abe
#PBS -M blahblah@blah.edu

#PBS -e mpitest.err
#PBS -o mpitest.out
#PBS -d /export/home/blah/MPIsample

mpirun -machinefile $PBS_NODEFILE -np $PBS_NP ./mpitest

輸出文件如下所示:

World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1
World size is 1

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 6
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Job complete

如果世界尺寸為1,則世界尺寸應打印一次,而不是8次。

err文件是:

[compute-0-34.local:13110] *** An error occurred in MPI_Send
[compute-0-34.local:13110] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13110] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13110] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13107] *** An error occurred in MPI_Send
[compute-0-34.local:13107] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13107] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13107] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13112] *** An error occurred in MPI_Send
[compute-0-34.local:13112] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13112] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13112] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13108] *** An error occurred in MPI_Send
[compute-0-34.local:13108] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13108] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13108] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13109] *** An error occurred in MPI_Send
[compute-0-34.local:13109] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13109] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13109] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13113] *** An error occurred in MPI_Send
[compute-0-34.local:13113] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13113] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13113] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13106] *** An error occurred in MPI_Send
[compute-0-34.local:13106] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13106] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13106] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-34.local:13111] *** An error occurred in MPI_Send
[compute-0-34.local:13111] *** on communicator MPI_COMM_WORLD
[compute-0-34.local:13111] *** MPI_ERR_RANK: invalid rank
[compute-0-34.local:13111] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

2天前,我能夠同時發送和接收消息,但此后,工作代碼向我顯示了此錯誤。 我的代碼或正在使用的高性能計算機是否存在任何問題?

從MPI的角度來看,您沒有啟動一個包含8個MPI任務的MPI作業,而是啟動了每個都有一個MPI任務的8個獨立的MPI作業。

當您混用兩個MPI實現時(例如,您的應用程序是使用Open MPI構建的,而您使用的是MPICH mpirun),通常會發生這種情況。

在調用mpirun之前,建議您添加PBS腳本

which mpirun
ldd mpitest

確保mpirun和MPI庫來自相同的庫(例如,相同的供應商相同的版本)

HPC出現問題,它沒有分配給我所需數量的處理器。 多謝你們。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM