简体繁体 English

Fortran 中 -O2 的分段错误

[英]Segmentation fault with -O2 in Fortran

原文 2018-02-11 09:46:14 7 1 linux/ debugging/ optimization/ segmentation-fault/ fortran

While developing a large Fortran software I have now and again come across this error, in particular when compiling with -O2 to have a bit better performance.在开发大型 Fortran 软件时，我时常遇到此错误，特别是在使用 -O2 进行编译以获得更好的性能时。

In some cases the error is real and can be corrected but in other cases I find no error and assume it is caused by -O2 shuffling around the code.在某些情况下，错误是真实的并且可以纠正，但在其他情况下，我没有发现任何错误，并假设它是由 -O2 对代码进行改组引起的。 As my old fashioned debugging technique is to add write statements close to the point where the error occurs have I found the error often disappeared when I did that.由于我的老式调试技术是在靠近错误发生点的位置添加写入语句，因此我发现错误通常会在我这样做时消失。

Maybe because the -O2 optimization is a bit careful shuffling around such statements.可能是因为 -O2 优化对此类语句进行了一些谨慎的改组。

Recently I had this error in a loop which was not very complex and not very time critical and adding a write statement inside the loop prevented this error.最近我在一个循环中遇到了这个错误，这个循环不是很复杂，时间也不是很关键，在循环中添加一个 write 语句可以防止这个错误。 When I removed the write statement the error came back.当我删除 write 语句时，错误又回来了。 To avoid creating a lot of meaningless output while running the program I found it was sufficient to write to an internal character so for a user nothing changed.为了避免在运行程序时创建大量无意义的 output，我发现写入内部字符就足够了，因此对于用户来说没有任何改变。

There was no error when compiling the code without -O2 but the loop is inside a module using many local variables and I do not know how to compile one subroutine in a module separately without -O2.在没有 -O2 的情况下编译代码时没有错误，但是循环在使用许多局部变量的模块内部，我不知道如何在没有 -O2 的情况下单独编译模块中的一个子例程。

I am using GNU Fortran 7.2.0 on Linux and Windows (this recent error occurred only on Linux but previously I have had similar problems with Windows).我在 Linux 和 Windows 上使用 GNU Fortran 7.2.0（这个最近的错误只发生在 Linux 上，但以前我在 Windows 上遇到过类似的问题）。 I do not have access to any other compilers but my code is free and has been compiled with other compilers with no problems reported.我无权访问任何其他编译器，但我的代码是免费的，并且已使用其他编译器编译，没有报告任何问题。

So my question is if one can turn off -O2 for a small part of the code inside a module or if there are better alternatives than adding write statements to prevent -O2 to shuffle around the code inside a particular subroutine.所以我的问题是，是否可以为模块内的一小部分代码关闭 -O2，或者是否有比添加 write 语句更好的替代方法来防止 -O2 在特定子例程内的代码周围打乱。

1 个解决方案

I arrived here via a web search, and in case it helps anyone, I want to share the solution in my case.我通过 web 搜索来到这里，如果它对任何人有帮助，我想分享我的解决方案。 Many commenters have mentioned the possibility of uninitialized variables.许多评论者提到了未初始化变量的可能性。 I had no such warning in my code, but I produced an apparently identical effect by allocating storage in a subroutine in such a way that the compiler presumably did not realize that freshly allocated memory was about to be used in the next line.我的代码中没有这样的警告，但我通过在子例程中分配存储产生了明显相同的效果，编译器可能没有意识到新分配的 memory 将在下一行中使用。

I did this by accidentally including an allocatable array in a module and then also passing it as an argument to a subroutine that first did the allocation (using the name in the module) and immediately thereafter started assigning values to array elements using the name that I had passed in as an argument.我这样做是因为不小心在模块中包含了一个可分配的数组，然后将它作为参数传递给首先进行分配的子例程（使用模块中的名称），然后立即开始使用我的名称为数组元素赋值作为参数传入。 When I moved from -O1 to -O2, it stopped working.当我从 -O1 移动到 -O2 时，它停止工作。 Rather than detecting the problem, -fcheck=bounds actually caused the code to run without the segfault. -fcheck=bounds 没有检测到问题，实际上导致代码在没有段错误的情况下运行。

I located the offending line by using -O2 -g -traceback without checking bounds.我在不检查边界的情况下使用 -O2 -g -traceback 找到了有问题的行。 When I started referring to the array consistently using the name that I had passed as an argument, the problem went away.当我开始始终使用我作为参数传递的名称来引用数组时，问题就消失了。