简体   繁体   English

MPI分段故障(信号11)

[英]MPI Segmentation fault (signal 11)

I have been trying for more than two days to see what mistakes I have done but I couldn't find anything. 我已经尝试了超过两天,以查看自己犯了什么错误,但找不到任何东西。 I keep getting the following error: 我不断收到以下错误:

= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES =终止您的应用程序之一

= EXIT CODE: 139 =退出码:139

= CLEANING UP REMAINING PROCESSES =清理剩余的过程

= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =您可以忽略下面的清理消息

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)

This typically refers to a problem with your application.

Please see the FAQ page for debugging suggestions

make: *** [run] Error 139

So the problem clearly in MPI_BCAST and in another function I have MPI_GATHER . 所以问题显然在MPI_BCAST和另一个函数中我有MPI_GATHER Can you help me figure out what's wrong? 您能帮我找出问题所在吗? When I compile the code I type the following: 当我编译代码时,键入以下内容:

/usr/bin/mpicc  -I/usr/include   -L/usr/lib  z.main.c  z.mainMR.c  z.mainWR.c  -o  1dcode -g  -lm

For run: 运行:

usr/bin/mpirun -np 2 ./1dcode dat.txt o.out.txt

For example my code includes this function: 例如,我的代码包含以下功能:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <math.h>
#include <string.h>
#include "functions.h"
#include <mpi.h>
/*...................z.mainMR master function............. */
void MASTER(int argc, char *argv[], int nPROC, int nWRs, int mster)
{

/*... Define all the variables we going to use in z.mainMR function..*/
double tend, dtfactor, dtout, D, b, dx, dtexpl, dt, time;
int MM, M, maxsteps, nsteps;
FILE *datp, *outp;
/*.....Reading the data file "dat" then saving the data in o.out.....*/
datp = fopen(argv[1],"r"); // Open the file in read mode
outp = fopen(argv[argc-1],"w"); // Open output file in write mode
if(datp != NULL) // If data file is not empty continue
{
fscanf(datp,"%d %lf %lf %lf %lf %lf",&MM,&tend,&dtfactor,&dtout,&D,&b);    // read the data
fprintf(outp,"data>>>\nMM=%d\ntend=%lf\ndtfactor=%lf\ndtout=%lf\nD=%lf\nb=%lf\n",MM,tend,dtfactor,dtout,D,b);
fclose(datp); // Close the data file
fclose(outp); // Close the output file
}
else // If the file is empty then print an error message
{
    printf("There is something wrong. Maybe file is empty.\n");
}

/*.... Find dx, M, dtexpl, dt and the maxsteps........*/
dx = 1.0/ (double) MM;
M = b * MM;
dtexpl = (dx * dx) / (2.0 * D);
dt = dtfactor * dtexpl;
maxsteps = (int)( tend / dt ) + 1;

/*...Pack integers in iparms array, reals in parms array...*/
int iparms[2] = {MM,M};
double parms[4] = {dx, dt, D, b}; 
MPI_BCAST(iparms,2, MPI_INT,0,MPI_COMM_WORLD);
MPI_BCAST(parms, 4, MPI_DOUBLE,0, MPI_COMM_WORLD);
}

The runtime error is due to an unfortunate combination of a specific trait of MPICH and a feature of the C language. 运行时错误是由于MPICH的特定特征和C语言的功能的不幸组合所致。

MPICH provides both C and Fortran interface code within a single library file: MPICH在单个库文件中同时提供C和Fortran接口代码:

000000000007c7a0 W MPI_BCAST
00000000000cd180 W MPI_Bcast
000000000007c7a0 W PMPI_BCAST
00000000000cd180 T PMPI_Bcast
000000000007c7a0 W mpi_bcast
000000000007c7a0 W mpi_bcast_
000000000007c7a0 W mpi_bcast__
000000000007c7a0 W pmpi_bcast
000000000007c7a0 T pmpi_bcast_
000000000007c7a0 W pmpi_bcast__

The Fortran calls are exported under a variety of aliases in order to support many different Fortran compilers at the same time, including the all upper case MPI_BCAST . 为了同时支持许多不同的Fortran编译器,包括所有大写的MPI_BCAST ,以各种别名导出了Fortran调用。 MPI_BCAST itself is not declared in mpi.h but ANSI C allows for calling functions without preceding prototype declarations. MPI_BCAST本身未在mpi.h声明,但ANSI C允许在不使用原型声明的情况下调用函数。 Enabling C99 by passing -std=c99 to the compiler would have resulted into a warning about implicit declaration of the MPI_BCAST function. 通过将-std=c99传递给编译器来启用C99,将导致有关MPI_BCAST函数的隐式声明的警告。 Also -Wall would have resulted in a warning. 而且-Wall会导致警告。 The code will fail to link with Open MPI, which provides the Fortran interface in a separate library that mpicc does not link against. 该代码将无法与Open MPI链接,后者会在mpicc不能链接的单独库中提供Fortran接口。

Even if the code compiles and links properly, Fortran functions expect all their arguments to be passed by reference. 即使代码正确编译和链接,Fortran函数也希望它们的所有参数都通过引用传递。 Also, Fortran MPI calls take an additional output argument where the error code is returned. 同样,Fortran MPI调用采用附加的输出参数,在该参数中返回错误代码。 Therefore the segmentation fault. 因此分割错误。

To prevent such errors in the future, compile with -Wall -Werror , which should catch similar problems as early as possible. 为了将来避免此类错误,请使用-Wall -Werror编译,该编译器应尽早发现类似的问题。

Just so this has a formal answer: you spelled MPI_Bcast as MPI_BCAST . 正好有一个正式的答案:您将MPI_Bcast拼写为MPI_BCAST I would have assumed that this would have thrown a linking error at you for trying to access a function that doesn't exist, but apparently it didn't. 我本来以为这会因尝试访问不存在的函数而向您抛出链接错误,但显然不存在。

My guess is that your MPI implementation defines both the Fortran and C MPI functions in the same header file. 我的猜测是,您的MPI实现在同一个头文件中定义了Fortran和C MPI函数。 Your program then was accidentally calling the Fortran function MPI_BCAST and the types were not adding up ( MPI_INTEGER (Fortran) is not necessarily MPI_INT (C)), somehow giving you the segfault. 你的程序,然后不小心被调用Fortran函数MPI_BCAST和类型不累加( MPI_INTEGER (Fortran语言)不一定MPI_INT (C)),以某种方式给你的段错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM