简体   繁体   English

使用nvcc,icpc编译时来自cudaGetDeviceCount的运行时cudaErrorInsufficientDriver错误

[英]Runtime cudaErrorInsufficientDriver error from cudaGetDeviceCount when compiling with nvcc, icpc

PROBLEM 问题

I have an FFT-based application that uses FFTW3. 我有一个使用FFTW3的基于FFT的应用程序。 I am working on porting the application to a CUDA-based implementation using CUFFT. 我正在使用CUFFT将应用程序移植到基于CUDA的实现中。 Compiling and running the FFT core of the application standalone within Nsight works fine. 在Nsight中独立地编译和运行应用程序的FFT核心可以正常工作。 I have moved from there to integrating the device code into my application. 我从那里开始将设备代码集成到我的应用程序中。

When I run using with the CUFFT core code integrated into my application, cudaGetDeviceCount returns a cudaErrorInsufficientDriver error, although I did not get it with the Nsight standalone run. 当我使用集成到我的应用程序中的CUFFT核心代码运行时, cudaGetDeviceCount返回一个cudaErrorInsufficientDriver错误,尽管我在Nsight独立运行中没有得到它。 This call is made at the beginning of the run when I'm initializing the GPU. 在初始化GPU时,会在运行开始时进行此调用。

BACKGROUND 背景

I am running on CentOS 6, using CUDA 7.0 on a GeForce GTX 750, and icpc 12.1.5. 我在CentOS 6上运行,在GeForce GTX 750和icpc 12.1.5上使用CUDA 7.0。 I have also successfully tested a small example using a GT 610. Both cards work in Nsight (and I've also compiled and run command-line without problems, though not as extensively as from within Nsight). 我还成功地使用GT 610测试了一个小示例。两张卡都可以在Nsight中使用(并且我也可以毫无问题地编译和运行命令行,尽管不如从Nsight内部扩展)。

To integrate the CUFFT implementation of the FFT core into my application, I compiled and device-linked with nvcc and then used icpc (the Intel C++ Compiler) to compile the host code and to link the device and host code to create a .so. 为了将FFT核心的CUFFT实现集成到我的应用程序中,我使用nvcc编译和设备链接,然后使用icpc (英特尔C ++编译器)来编译主机代码,并将设备和主机代码链接在一起以创建.so。 I finally completed that step without errors or warnings (relying on this tutorial ). 我终于完成了这一步,没有错误或警告(依赖于本教程 )。

(The reasoning as to why I'm using a .so has a fair amount of history and additional background. Suffice it to say that making a .so is required for my application.) (关于为什么我使用.so的原因有相当多的历史和其他背景。可以这么说,对于我的应用程序来说,制作.so是必需的。)

The tutorial points out that compilation steps are different between generating the standalone executable (as I do in Nsight) and generating a device-linked library for inclusion in a .so. 本教程指出,在生成独立的可执行文件(如我在Nsight中所做的)与生成设备链接的库以包含在.so中之间,编译步骤有所不同。 To get through the compilation, I had to add -lcudart as described in the tutorial, as well as -lcuda , to my icpc linking call (as well as the -L to add .../cuda-7.0/lib64 and .../cuda-7.0/lib64/stubs as the paths to those libraries). 要获得通过编译,我不得不添加-lcudart在本教程中介绍,以及-lcuda ,我icpc链接调用(还有-L添加.../cuda-7.0/lib64.../cuda-7.0/lib64/stubs作为这些库的路径)。

NOTE: nvcc links in libcudart by default. 注意:默认情况下, libcudartnvcc链接。 I'm assuming it does the same for libcuda since Nsight doesn't include either of these libraries in any of the compile and linking steps.. As an aside, I do find it strange that although nvcc links them in by default, they don't show up from a call to ldd on the executable. 我假设它对libcuda也是一样的,因为Nsight在任何编译和链接步骤中都不包含这些库中的任何一个。. libcuda ,我确实感到奇怪,尽管默认情况下nvcc链接了它们,但它们却没有不会从对可执行文件的ldd的调用中显示。

I also had to add --compiler-options '-fPIC' to my nvcc commands to avoid errors described here . 我还必须在我的nvcc命令中添加--compiler-options '-fPIC'以避免此处描述的错误。

I have seen some chatter (for one example, see this post ) about Intel/NVCC compatibilities, but it looks like they arise at compile-time with older versions of NVCC, so...I think I'm ok on that account. 我已经看到了一些有关Intel / NVCC兼容性的闲聊(例如,请参阅此帖子 ),但是看起来它们是在编译时与较旧版本的NVCC出现的,所以...我认为认为还可以。

Finally, here are the compile commands for compilation of three .cu files (all are identical except for the name of the .cu file and the name of the .o file): 最后,这是用于编译三个.cu文件的编译命令(除了.cu文件的名称和.o文件的名称以外,其他命令都相同):

nvcc
-ccbin g++
-Iinc
-I/path/to/cuda/samples/common/inc
-m64
-O3
-gencode arch=compute_20,code=sm_20
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_52,code=compute_52
--relocatable-device-code=true
--compile
--compiler-options '-fPIC'
-o my_object_file1.o
-c my_source_code_file1.cu

And here are the flags I pass to the device linking step: 这是我传递给设备链接步骤的标志:

nvcc
-ccbin g++
-Iinc
-I/path/to/cuda/samples/common/inc
-m64
-O3
-gencode arch=compute_20,code=sm_20
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_52,code=compute_52
--compiler-options '-fPIC'
--device-link
my_object_file1.o
my_object_file2.o
my_object_file3.o
-o my_device_linked_object_file.o

I probably don't need the -gencode flags for 30, 37, and 52, at least currently, but they shouldn't cause any problems, and eventually, I will likely compile that way. 至少目前,我大概不需要-gencode和52的-gencode标志,但它们不会引起任何问题,最终,我可能会以这种方式进行编译。

And here are my compiling flags (minus the -o flag, and all my -I flags) that I use for the .cc file that uses calls my CUDA library: 这是我用于.cc文件的编译标志(减去-o标志和所有-I标志),这些文件调用CUDA库:

-c
-fpic
-D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64
-fno-operator-names
-D_REENTRANT
-D_POSIX_PTHREAD_SEMANTICS
-DM2KLITE -DGCC_
-std=gnu++98
-O2
-fp-model source
-gcc
-wd1881
-vec-report0

Finally, here are my linking flags: 最后,这是我的链接标志:

-pthread
-shared

Any ideas on how to fix this problem? 关于如何解决此问题的任何想法?

Don't add to LD_LIBRARY_PATH .../cuda7.0/lib64/stubs . 不要添加到LD_LIBRARY_PATH .../cuda7.0/lib64/stubs If you do, you will pick up libcuda.so from there instead of from the driver. 如果这样做,您将从那里而不是从驱动程序中拾取libcuda.so。 (See this post ). (请参阅这篇文章 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM