简体   繁体   English

从 Fortran 调用 C 函数时,指针被屏蔽

[英]Pointer is being being masked when calling a C function from Fortran

TL;DR TL; 博士

When I pass an array from Fortran to C, the array's address is incorrect in C. I've checked this by printing the address of the array in Fortran before the CALL , then stepping into the C function and printing the address of the argument.当我将一个数组从 Fortran 传递给 C 时,该数组的地址在 C 中是不正确的。我通过在CALL之前在 Fortran 中打印数组的地址来检查这一点,然后进入 C 函数并打印参数的地址。

  • The Fortran pointer: 0x9acd44c0 Fortran 指针: 0x9acd44c0
  • The C pointer: 0xffffffff9acd44c0 C 指针: 0xffffffff9acd44c0

The upper dword of the C pointer has been set to 0xffffffff . C 指针的高位双字已设置为0xffffffff I'm trying to understand why this is happening, and only happening on the HPC cluster and not on a development machine.我试图理解为什么会发生这种情况,并且只发生在 HPC 集群上,而不是在开发机器上。

Context语境

I'm using a rather large scientific program written in Fortran/C++/CUDA.我正在使用一个用 Fortran/C++/CUDA 编写的相当大的科学程序。 On some particular machine, I get a segfault when calling a C function from Fortran.在某些特定机器上,从 Fortran 调用 C 函数时出现段错误。 I've found that a pointer is being passed to the C function with some bytes set incorrectly.我发现一个指针被传递给 C 函数,其中一些字节设置不正确。

Code Snippets代码片段

Every Fortran file in the program includes a common header file which sets up some options and declares the common blocks.程序中的每个 Fortran 文件都包含一个公共头文件,它设置一些选项并声明公共块。

IMPLICIT REAL*8  (A-H,O-Z)
COMMON/NBODY/  X(3,NMAX), BODY(NMAX)
COMMON/GPU/    GPUPHI(NMAX)

The Fortran call site looks like this: Fortran 调用站点如下所示:

CALL GPUPOT(NN,BODY(IFIRST),X(1,IFIRST),GPUPHI)

And the C function, which is compiled by nvcc , is declared like so:nvcc编译的 C 函数声明如下:

extern "C" void gpupot_(int *n,
                       double m[],
                       double x[][3],
                       double pot[]);

GDB Output GDB 输出

I found from debugging that the value of the pointer to pot is incorrect;调试发现pot指针的值不对; so any attempt to access that array will segfault.所以任何访问该数组的尝试都会出现段错误。

When I ran the program with gdb, I put a break point just before the call to gpupot and printed the value of the GPUPHI variable:当我使用 gdb 运行程序时,我在调用gpupot之前放置了一个断点并打印了GPUPHI变量的值:

(gdb) p &GPUPHI   
$1 = (PTR TO -> ( real(kind=8) (1050000))) 0x9acd44c0 <gpu_>

I then let the debugger step into the gpupot_ C function, and inspected the value of the pot argument:然后我让调试器进入gpupot_函数,并检查pot参数的值:

(gdb) p pot
$2 = (double *) 0xffffffff9acd44c0

All of the other arguments have the correct pointer values.所有其他参数都具有正确的指针值。

Compiler options编译器选项

The compiler options that are set for gfortran are:gfortran设置的编译器选项是:

 -fPIC -O3 -ffast-math -Wall -fopenmp -mcmodel=medium -march=native -mavx -m64  

And nvcc is using the following:并且nvcc正在使用以下内容:

-ccbin=g++ -Xptxas -v -ftz=true -lineinfo -D_FORCE_INLINES \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_35,code=compute_35 -Xcompiler \
"-O3 -fPIC -Wall -fopenmp -std=c++11 -fPIE -m64 -mavx \
-march=native" -std=c++14 -lineinfo 

For debugging, the -O3 is replaced with -g -O0 -fcheck=all -fstack-protector -fno-omit-frame-pointer , but the behaviour (crash) remains the same.对于调试, -O3替换为-g -O0 -fcheck=all -fstack-protector -fno-omit-frame-pointer ,但行为(崩溃)保持不变。

This is prefaced by my top comments [and yours].这是我的主要评论 [和你的] 的序言。

It looks like you're getting an [unwanted] sign extension of the address.您似乎收到了地址的 [不需要的] 标志扩展名。

gfortran is being built with -mcmodel=medium but C does not. gfortran是用-mcmodel=medium构建的,但 C 没有。

With that option, larger symbols/arrays will be linked above 2GB [which has the sign bit set]使用该选项,较大的符号/数组将链接到 2GB 以上 [已设置符号位]

So, add the option to both or leave it off both to fix the problem.因此,将选项添加到两者或将其都保留以解决问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM