[英]Pointer is being being masked when calling a C function from Fortran
When I pass an array from Fortran to C, the array's address is incorrect in C. I've checked this by printing the address of the array in Fortran before the CALL
, then stepping into the C function and printing the address of the argument.当我将一个数组从 Fortran 传递给 C 时,该数组的地址在 C 中是不正确的。我通过在CALL
之前在 Fortran 中打印数组的地址来检查这一点,然后进入 C 函数并打印参数的地址。
0x9acd44c0
Fortran 指针: 0x9acd44c0
0xffffffff9acd44c0
C 指针: 0xffffffff9acd44c0
The upper dword of the C pointer has been set to 0xffffffff
. C 指针的高位双字已设置为0xffffffff
。 I'm trying to understand why this is happening, and only happening on the HPC cluster and not on a development machine.我试图理解为什么会发生这种情况,并且只发生在 HPC 集群上,而不是在开发机器上。
I'm using a rather large scientific program written in Fortran/C++/CUDA.我正在使用一个用 Fortran/C++/CUDA 编写的相当大的科学程序。 On some particular machine, I get a segfault when calling a C function from Fortran.在某些特定机器上,从 Fortran 调用 C 函数时出现段错误。 I've found that a pointer is being passed to the C function with some bytes set incorrectly.我发现一个指针被传递给 C 函数,其中一些字节设置不正确。
Every Fortran file in the program includes a common header file which sets up some options and declares the common blocks.程序中的每个 Fortran 文件都包含一个公共头文件,它设置一些选项并声明公共块。
IMPLICIT REAL*8 (A-H,O-Z)
COMMON/NBODY/ X(3,NMAX), BODY(NMAX)
COMMON/GPU/ GPUPHI(NMAX)
The Fortran call site looks like this: Fortran 调用站点如下所示:
CALL GPUPOT(NN,BODY(IFIRST),X(1,IFIRST),GPUPHI)
And the C function, which is compiled by nvcc
, is declared like so:由nvcc
编译的 C 函数声明如下:
extern "C" void gpupot_(int *n,
double m[],
double x[][3],
double pot[]);
I found from debugging that the value of the pointer to pot
is incorrect;调试发现pot
指针的值不对; so any attempt to access that array will segfault.所以任何访问该数组的尝试都会出现段错误。
When I ran the program with gdb, I put a break point just before the call to gpupot
and printed the value of the GPUPHI
variable:当我使用 gdb 运行程序时,我在调用gpupot
之前放置了一个断点并打印了GPUPHI
变量的值:
(gdb) p &GPUPHI
$1 = (PTR TO -> ( real(kind=8) (1050000))) 0x9acd44c0 <gpu_>
I then let the debugger step into the gpupot_
C function, and inspected the value of the pot
argument:然后我让调试器进入gpupot_
函数,并检查pot
参数的值:
(gdb) p pot
$2 = (double *) 0xffffffff9acd44c0
All of the other arguments have the correct pointer values.所有其他参数都具有正确的指针值。
The compiler options that are set for gfortran
are:为gfortran
设置的编译器选项是:
-fPIC -O3 -ffast-math -Wall -fopenmp -mcmodel=medium -march=native -mavx -m64
And nvcc
is using the following:并且nvcc
正在使用以下内容:
-ccbin=g++ -Xptxas -v -ftz=true -lineinfo -D_FORCE_INLINES \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_35,code=compute_35 -Xcompiler \
"-O3 -fPIC -Wall -fopenmp -std=c++11 -fPIE -m64 -mavx \
-march=native" -std=c++14 -lineinfo
For debugging, the -O3
is replaced with -g -O0 -fcheck=all -fstack-protector -fno-omit-frame-pointer
, but the behaviour (crash) remains the same.对于调试, -O3
替换为-g -O0 -fcheck=all -fstack-protector -fno-omit-frame-pointer
,但行为(崩溃)保持不变。
This is prefaced by my top comments [and yours].这是我的主要评论 [和你的] 的序言。
It looks like you're getting an [unwanted] sign extension of the address.您似乎收到了地址的 [不需要的] 标志扩展名。
gfortran
is being built with -mcmodel=medium
but C does not. gfortran
是用-mcmodel=medium
构建的,但 C 没有。
With that option, larger symbols/arrays will be linked above 2GB [which has the sign bit set]使用该选项,较大的符号/数组将链接到 2GB 以上 [已设置符号位]
So, add the option to both or leave it off both to fix the problem.因此,将选项添加到两者或将其都保留以解决问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.