简体   繁体   English

为什么我使用 MPI_Bcast 和 -O3 编译器标志会收到堆栈粉碎错误,但没有 -O3 一切正常?

[英]Why do I get a stack smashing error using MPI_Bcast and -O3 compiler flag, but everything works without -O3?

I am pretty new to MPI, so apologies if this is simple.我对 MPI 很陌生,如果这很简单,我很抱歉。

I have some code from a month or two ago that has been working fine, but I decided to go back and revise it.我有一两个月前的一些代码一直运行良好,但我决定返回 go 并对其进行修改。 (It was written when I was just starting out, and it's not a performance critical section.) The code basically generates a random graph on one process and then shares the results with all other processes. (它是我刚开始时写的,它不是性能关键部分。)代码基本上在一个进程上生成一个随机图,然后与所有其他进程共享结果。 An excerpt from the baby's-first-steps version follows:婴儿的第一步版本的摘录如下:

unsigned int *graph;

if (commrank == 0) {
    graph = gengraph(params); //allocates graph memory in function
    if (commsize > 1) {
        for (int k=1; k<commsize; k++) 
            MPI_Send(graph, n*n, MPI_UNSIGNED, k, 0, MPI_COMM_WORLD);
    }
} else {
    MPI_Status recvStatus;
    graph = malloc(sizeof(unsigned int)*n*n);
    MPI_Recv(graph, n*n, MPI_UNSIGNED, 0, 0, MPI_COMM_WORLD, &recvStatus);
}

While obviously naive, this worked just fine for a while, before I chose to go back and do it in what I thought was the proper fashion:虽然显然很天真,但在我选择 go 并以我认为正确的方式执行之前,这工作了一段时间:

if (commrank == 0) {
    graph = gengraph(params);
    MPI_Bcast(graph, n*n, MPI_UNSIGNED, 0, MPI_COMM_WORLD);
} else {
    graph = malloc(sizeof(unsigned int)*n*n);
    MPI_Bcast(graph, n*n, MPI_UNSIGNED, 0, MPI_COMM_WORLD);
}

The problem is, I keep getting when "stack smashing" errors in the second version when I compile with -O3 optimization, though it works fine when compiled unoptimized.问题是,当我使用 -O3 优化进行编译时,我在第二个版本中不断出现“堆栈粉碎”错误,尽管在未优化编译时它工作正常。 Note that I have checked the graph allocation function multiple times and debugged it, and it appears to be fine.请注意,我已经多次检查了图形分配 function 并对其进行了调试,看起来还不错。 I have also debugged the second version, and it appears to work fine.我还调试了第二个版本,它似乎工作正常。 The crash occurs later when I try to free the graph memory.稍后当我尝试释放图形 memory 时发生崩溃。 (Note that this is not a double free error, and, again, it works fine in the naive implementation and has for some time.) (请注意,这不是双重释放错误,而且,它在幼稚的实现中也能正常工作,并且已经有一段时间了。)

One final wrinkle: The first version also fails if, instead of using the recvStatus variable, I instead use MPI_STATUS_IGNORE .最后一个问题:如果我没有使用recvStatus变量,而是使用MPI_STATUS_IGNORE第一个版本也会失败。 And, again, this only fails with -O3.而且,这仅在 -O3 时失败。

Any thoughts would be greatly appreciated.任何想法将不胜感激。 If it's any help, I'm using mpicc on top of gcc 7.5.0, but I imagine I'm doing something stupid rather than encountering a compiler problem.如果有任何帮助,我在 gcc 7.5.0 之上使用 mpicc,但我想我在做一些愚蠢的事情而不是遇到编译器问题。

I changed the mpicc compiler to Clang and used Address Sanitizer, per the suggestion of @hristo-iliev, and found an error in a subsequent MPI call (a recv with the wrong count size).根据@hristo-iliev 的建议,我将 mpicc 编译器更改为 Clang 并使用了 Address Sanitizer,并在随后的 MPI 调用中发现了错误(recv 计数大小错误)。 This led to the undefined behavior.这导致了未定义的行为。 Notably, the address sanitizer pinpointed location of the error quite clearly, while valgrind only gave rather opaque indications that something was going on in MPI (as, well, it always does).值得注意的是,地址清理程序非常清楚地指出了错误的位置,而 valgrind 仅给出了相当不透明的指示,表明 MPI 中正在发生某些事情(好吧,它总是如此)。

Apologies to the StackOverflow community for this, as the code above was not the culprit (not entirely surprising).为此向 StackOverflow 社区道歉,因为上面的代码不是罪魁祸首(并不完全令人惊讶)。 It was just some standard undefined behavior due to sloppiness.由于草率,这只是一些标准的未定义行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM