简体   繁体   中英

OpenACC nvlink undefined reference to class

I am new to OpenACC and I am writing a new program from scratch (I have a fairly good idea what loops will be computationally costly from working in a similar problem before). I am getting an "Undefined reference" from nvlink. From my research, I found this is because no device code is being generated for the class I created. However, I don't understand why this is happening and how to fix it.

Below I send a MWE from my code.

include/vec1.h

#ifndef VEC1_H
#define VEC1_H

class Vec1{
    
public:
    double data[1];

    #pragma acc routine seq
    Vec1();
    #pragma acc routine seq
    Vec1(double x);
    
    #pragma acc routine seq
    Vec1 operator* (double x);

};

#endif

src/vec1.cpp

#include "vec1.h"

Vec1::Vec1(){
    data[0] = .0;
}

Vec1::Vec1(double x){
    data[0] = x;
}


Vec1 Vec1::operator*(double c){
    Vec1 r = Vec1(0.);
    r.data[0] = c*data[0];
    return r;
}

vec1_test_gpu.cpp

#include "vec1.h"

#define NUM_VECTORS 1000000

int main(){
    
    Vec1 vec1_array[NUM_VECTORS];
    for(int iv=0; iv<NUM_VECTORS; ++iv){
        vec1_array[iv] = Vec1(iv);
    }
    #pragma acc data copyin(vec1_array)
    
    #pragma acc parallel loop
    for(int iv=0; iv<NUM_VECTORS; ++iv){
        vec1_array[iv] = vec1_array[iv]*2;
    }
    return 0;
}

I compile them in the following way

$ nvc++ src/vec1.cpp -c -I./include -O3 -march=native -ta=nvidia:cuda11.2 -fPIC
$ nvc++ -shared -o libvec1.so vec1.o
$ nvc++ vec1_test_gpu.cpp -I./include -O3 -march=native -ta=nvidia:cuda11.2 -L./ -lvec1

The error message appears just after the last command and reads nvlink error: Undefined reference to '_ZN4Vec1mlEd' in '/tmp/nvc++jOtCBiT_m38d.o'

The problem here is that you're trying to call a device routine, "Vec1::operator*", that's contained in a shared object from a kernel in the main program. nvc++'s OpenACC implementation uses CUDA to target NVIDIA devices. Since CUDA doesn't have a dynamic linker for device code, at least not yet, this isn't supported.

You'll need to either link this statically, or move the "parallel loop" into the shared object.

Note that the "-ta" flag has been deprecated. Please consider using "-acc -gpu=cuda11.2" instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM