为什么Theano打印“cc1plus：致命错误：cuda_runtime.h：没有这样的文件或目录”？

Question

I am trying to use the GPU with Theano. 我正在尝试与Theano一起使用GPU。 I've read this tutorial . 我已经阅读了这个教程。

However, I can't get theano to use the GPU and I don't know how to continue. 但是，我不能让theano使用GPU，我不知道如何继续。

Testing machine 试验机

$ cat /etc/issue
Welcome to openSUSE 12.1 "Asparagus" - Kernel \r (\l).
$ nvidia-smi -L
GPU 0: Tesla C2075 (S/N: 0324111084577)
$ echo $LD_LIBRARY_PATH
/usr/local/cuda-5.0/lib64:[other]:/usr/local/lib:/usr/lib:/usr/local/X11/lib:[other]
$ find /usr/local/ -name cuda_runtime.h
/usr/local/cuda-5.0/include/cuda_runtime.h
$ echo $C_INCLUDE_PATH
/usr/local/cuda-5.0/include/
$ echo $CXX_INCLUDE_PATH
/usr/local/cuda-5.0/include/
$ nvidia-smi -a
NVIDIA: could not open the device file /dev/nvidiactl (Permission denied).
Failed to initialize NVML: Insufficient Permissions
$ echo $PATH
/usr/lib64/mpi/gcc/openmpi/bin:/home/mthoma/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:.:/home/mthoma/bin
$ ls -l /dev/nv*
crw-rw---- 1 root video 195,   0  1. Jul 09:47 /dev/nvidia0
crw-rw---- 1 root video 195, 255  1. Jul 09:47 /dev/nvidiactl
crw-r----- 1 root kmem   10, 144  1. Jul 09:46 /dev/nvram
# nvidia-smi -a

==============NVSMI LOG==============

Timestamp                       : Wed Jul 30 05:13:52 2014
Driver Version                  : 304.33

Attached GPUs                   : 1
GPU 0000:04:00.0
    Product Name                : Tesla C2075
    Display Mode                : Enabled
    Persistence Mode            : Disabled
    Driver Model
        Current                 : N/A
        Pending                 : N/A
    Serial Number               : 0324111084577
    GPU UUID                    : GPU-7ea505ef-ad46-bb24-c440-69da9b300040
    VBIOS Version               : 70.10.46.00.05
    Inforom Version
        Image Version           : N/A
        OEM Object              : 1.1
        ECC Object              : 2.0
        Power Management Object : 4.0
    PCI
        Bus                     : 0x04
        Device                  : 0x00
        Domain                  : 0x0000
        Device Id               : 0x109610DE
        Bus Id                  : 0000:04:00.0
        Sub System Id           : 0x091010DE
        GPU Link Info
            PCIe Generation
                Max             : 2
                Current         : 1
            Link Width
                Max             : 16x
                Current         : 16x
    Fan Speed                   : 30 %
    Performance State           : P12
    Clocks Throttle Reasons     : N/A
    Memory Usage
        Total                   : 5375 MB
        Used                    : 39 MB
        Free                    : 5336 MB
    Compute Mode                : Default
    Utilization
        Gpu                     : 0 %
        Memory                  : 5 %
    Ecc Mode
        Current                 : Enabled
        Pending                 : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 0
            Double Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 0
        Aggregate
            Single Bit            
                Device Memory   : 133276
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 133276
            Double Bit            
                Device Memory   : 203730
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 203730
    Temperature
        Gpu                     : 58 C
    Power Readings
        Power Management        : Supported
        Power Draw              : 33.83 W
        Power Limit             : 225.00 W
        Default Power Limit     : N/A
        Min Power Limit         : N/A
        Max Power Limit         : N/A
    Clocks
        Graphics                : 50 MHz
        SM                      : 101 MHz
        Memory                  : 135 MHz
    Applications Clocks
        Graphics                : N/A
        Memory                  : N/A
    Max Clocks
        Graphics                : 573 MHz
        SM                      : 1147 MHz
        Memory                  : 1566 MHz
    Compute Processes           : None

Cuda sample Cuda样本

Compiling and executing worked as a super user (tested with cuda/C/0_Simple/simpleMultiGPU ): 编译和执行作为超级用户（使用cuda/C/0_Simple/simpleMultiGPU ）：

# ldconfig /usr/local/cuda-5.0/lib64/
# ./simpleMultiGPU 
[simpleMultiGPU] starting...

CUDA-capable device count: 1
Generating input data...

Computing with 1 GPUs...
  GPU Processing time: 27.814000 (ms)

Computing with Host CPU...

Comparing GPU and Host CPU results...
  GPU sum: 16777296.000000
  CPU sum: 16777294.395033
  Relative difference: 9.566307E-08 

[simpleMultiGPU] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!

When I try this as normal user, I get: 当我以普通用户身份尝试此操作时，我得到：

$ ./simpleMultiGPU 
[simpleMultiGPU] starting...

CUDA error at simpleMultiGPU.cu:87 code=38(cudaErrorNoDevice) "cudaGetDeviceCount(&GPU_N)" 
CUDA-capable device count: 0
Generating input data...

Floating point exception

How can I get cuda to work with non-super users? 如何让cuda与非超级用户合作？

Testing code 测试代码

The following code is from " Testing Theano with GPU " 以下代码来自“ 使用GPU测试Theano ”

#!/usr/bin/env python
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
else:
    print 'Used the gpu'

The error message 错误消息

The complete error message is much too long to post it here. 完整的错误消息太长了，无法在此处发布。 A longer version is on http://pastebin.com/eT9vbk7M , but I think the relevant part is: 更长的版本是http://pastebin.com/eT9vbk7M ，但我认为相关部分是：

cc1plus: fatal error: cuda_runtime.h: No such file or directory
compilation terminated.
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 1, 'for cmd', 'nvcc -shared -g -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=bcb411d72e41f81f3deabfc6926d9728,-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY,-D NPY_ARRAY_ALIGNED=NPY_ALIGNED,-D NPY_ARRAY_WRITEABLE=NPY_WRITEABLE,-D NPY_ARRAY_UPDATE_ALL=NPY_UPDATE_ALL,-D NPY_ARRAY_C_CONTIGUOUS=NPY_C_CONTIGUOUS,-D NPY_ARRAY_F_CONTIGUOUS=NPY_F_CONTIGUOUS,-fPIC -Xlinker -rpath,/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray -Xlinker -rpath,/usr/local/cuda-5.0/lib -Xlinker -rpath,/usr/local/cuda-5.0/lib64 -I/usr/local/lib/python2.7/site-packages/Theano-0.6.0rc1-py2.7.egg/theano/sandbox/cuda -I/usr/local/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include -I/usr/include/python2.7 -o /home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray/cuda_ndarray.so mod.cu -L/usr/local/cuda-5.0/lib -L/usr/local/cuda-5.0/lib64 -L/usr/lib64 -lpython2.7 -lcublas -lcudart')
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available

The standard stream gives: 标准流提供：

['nvcc', '-shared', '-g', '-O3', '-m64', '-Xcompiler', '-DCUDA_NDARRAY_CUH=bcb411d72e41f81f3deabfc6926d9728,-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY,-D NPY_ARRAY_ALIGNED=NPY_ALIGNED,-D NPY_ARRAY_WRITEABLE=NPY_WRITEABLE,-D NPY_ARRAY_UPDATE_ALL=NPY_UPDATE_ALL,-D NPY_ARRAY_C_CONTIGUOUS=NPY_C_CONTIGUOUS,-D NPY_ARRAY_F_CONTIGUOUS=NPY_F_CONTIGUOUS,-fPIC', '-Xlinker', '-rpath,/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray', '-Xlinker', '-rpath,/usr/local/cuda-5.0/lib', '-Xlinker', '-rpath,/usr/local/cuda-5.0/lib64', '-I/usr/local/lib/python2.7/site-packages/Theano-0.6.0rc1-py2.7.egg/theano/sandbox/cuda', '-I/usr/local/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include', '-I/usr/include/python2.7', '-o', '/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray/cuda_ndarray.so', 'mod.cu', '-L/usr/local/cuda-5.0/lib', '-L/usr/local/cuda-5.0/lib64', '-L/usr/lib64', '-lpython2.7', '-lcublas', '-lcudart']
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.25972604752 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu

theano.rc theano.rc

$ cat .theanorc 
[global]
device = gpu
floatX = float32
[cuda]
root = /usr/local/cuda-5.0

Answer 1

As some comments told, the problem is the permissio of /dev/nvidia*. 正如一些评论所说，问题是/ dev / nvidia *的许可。 As some told this mean during your startup, it don't get initialized correctly. 正如有些人在启动期间所说的那样，它没有正确初始化。 Normally, this is done correctly when the GUI is started. 通常，这在GUI启动时正确完成。 My guess is that you didn't enable it or install it. 我的猜测是你没有启用它或安装它。 So you probably have an headless server. 所以你可能有一个无头服务器。

To fix this, just run as root nvidia-smi . 要解决这个问题，只需以root身份运行nvidia-smi 。 This will detect that it isn't started correctly and will fix it. 这将检测到它未正确启动并将修复它。 root have the permission to fix things. root有权修复东西。 Normal user don't have the permission to fix this. 普通用户无权修复此问题。 That is why it work with root (it get automatically fixed), but not as normal user. 这就是为什么它与root一起工作（它会自动修复），但不是普通用户。

This fix need to be done each time the computer boot. 每次计算机启动时都需要执行此修复。 To automatise this, you can create as root this file /etc/init.d/nvidia-gpu-config with this content: 要自动执行此操作，您可以使用以下内容以root用户身份创建此文件/etc/init.d/nvidia-gpu-config ：

#!/bin/sh
#
# nvidia-gpu-config    Start the correct initialization of nvidia GPU driver.
#
# chkconfig: - 90 90
# description:  Init gpu to wanted states

# sudo /sbin/chkconfig --add nvidia-smi
#

case $1 in
'start')
nvidia-smi
;;
esac

Then as root run this command: /sbin/chkconfig --add nvidia-gpu-config . 然后以root身份运行此命令： /sbin/chkconfig --add nvidia-gpu-config 。

UPDATE: This work for OS that use the init system SysV. 更新：这项工作适用于使用init系统SysV的操作系统。 If your system use the init system systemd, I don't know if it work. 如果您的系统使用init系统systemd，我不知道它是否有效。

Answer 2

尝试将C_INCLUDE_PATH导出到系统上的cuda toolkit包含文件，例如：

export C_INCLUDE_PATH=${C_INCLUDE_PATH}:/usr/local/cuda/include

为什么Theano打印“cc1plus：致命错误：cuda_runtime.h：没有这样的文件或目录”？

问题描述

Testing machine 试验机

Cuda sample Cuda样本

Testing code 测试代码

The error message 错误消息

theano.rc theano.rc

2 个解决方案

解决方案1
4 2014-07-31 01:08:46

解决方案2
0 2014-07-29 08:55:00

为什么Theano打印“cc1plus：致命错误：cuda_runtime.h：没有这样的文件或目录”？

问题描述

Testing machine 试验机

Cuda sample Cuda样本

Testing code 测试代码

The error message 错误消息

theano.rc theano.rc

2 个解决方案

解决方案1 4 2014-07-31 01:08:46

解决方案2 0 2014-07-29 08:55:00

解决方案1
4 2014-07-31 01:08:46

解决方案2
0 2014-07-29 08:55:00