简体   繁体   中英

Pointer is being being masked when calling a C function from Fortran

TL;DR

When I pass an array from Fortran to C, the array's address is incorrect in C. I've checked this by printing the address of the array in Fortran before the CALL , then stepping into the C function and printing the address of the argument.

  • The Fortran pointer: 0x9acd44c0
  • The C pointer: 0xffffffff9acd44c0

The upper dword of the C pointer has been set to 0xffffffff . I'm trying to understand why this is happening, and only happening on the HPC cluster and not on a development machine.

Context

I'm using a rather large scientific program written in Fortran/C++/CUDA. On some particular machine, I get a segfault when calling a C function from Fortran. I've found that a pointer is being passed to the C function with some bytes set incorrectly.

Code Snippets

Every Fortran file in the program includes a common header file which sets up some options and declares the common blocks.

IMPLICIT REAL*8  (A-H,O-Z)
COMMON/NBODY/  X(3,NMAX), BODY(NMAX)
COMMON/GPU/    GPUPHI(NMAX)

The Fortran call site looks like this:

CALL GPUPOT(NN,BODY(IFIRST),X(1,IFIRST),GPUPHI)

And the C function, which is compiled by nvcc , is declared like so:

extern "C" void gpupot_(int *n,
                       double m[],
                       double x[][3],
                       double pot[]);

GDB Output

I found from debugging that the value of the pointer to pot is incorrect; so any attempt to access that array will segfault.

When I ran the program with gdb, I put a break point just before the call to gpupot and printed the value of the GPUPHI variable:

(gdb) p &GPUPHI   
$1 = (PTR TO -> ( real(kind=8) (1050000))) 0x9acd44c0 <gpu_>

I then let the debugger step into the gpupot_ C function, and inspected the value of the pot argument:

(gdb) p pot
$2 = (double *) 0xffffffff9acd44c0

All of the other arguments have the correct pointer values.

Compiler options

The compiler options that are set for gfortran are:

 -fPIC -O3 -ffast-math -Wall -fopenmp -mcmodel=medium -march=native -mavx -m64  

And nvcc is using the following:

-ccbin=g++ -Xptxas -v -ftz=true -lineinfo -D_FORCE_INLINES \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_35,code=compute_35 -Xcompiler \
"-O3 -fPIC -Wall -fopenmp -std=c++11 -fPIE -m64 -mavx \
-march=native" -std=c++14 -lineinfo 

For debugging, the -O3 is replaced with -g -O0 -fcheck=all -fstack-protector -fno-omit-frame-pointer , but the behaviour (crash) remains the same.

This is prefaced by my top comments [and yours].

It looks like you're getting an [unwanted] sign extension of the address.

gfortran is being built with -mcmodel=medium but C does not.

With that option, larger symbols/arrays will be linked above 2GB [which has the sign bit set]

So, add the option to both or leave it off both to fix the problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM