Why does Theano print “cc1plus: fatal error: cuda_runtime.h: No such file or directory”?

Question

I am trying to use the GPU with Theano. I've read this tutorial .

However, I can't get theano to use the GPU and I don't know how to continue.

Testing machine

$ cat /etc/issue
Welcome to openSUSE 12.1 "Asparagus" - Kernel \r (\l).
$ nvidia-smi -L
GPU 0: Tesla C2075 (S/N: 0324111084577)
$ echo $LD_LIBRARY_PATH
/usr/local/cuda-5.0/lib64:[other]:/usr/local/lib:/usr/lib:/usr/local/X11/lib:[other]
$ find /usr/local/ -name cuda_runtime.h
/usr/local/cuda-5.0/include/cuda_runtime.h
$ echo $C_INCLUDE_PATH
/usr/local/cuda-5.0/include/
$ echo $CXX_INCLUDE_PATH
/usr/local/cuda-5.0/include/
$ nvidia-smi -a
NVIDIA: could not open the device file /dev/nvidiactl (Permission denied).
Failed to initialize NVML: Insufficient Permissions
$ echo $PATH
/usr/lib64/mpi/gcc/openmpi/bin:/home/mthoma/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:.:/home/mthoma/bin
$ ls -l /dev/nv*
crw-rw---- 1 root video 195,   0  1. Jul 09:47 /dev/nvidia0
crw-rw---- 1 root video 195, 255  1. Jul 09:47 /dev/nvidiactl
crw-r----- 1 root kmem   10, 144  1. Jul 09:46 /dev/nvram
# nvidia-smi -a

==============NVSMI LOG==============

Timestamp                       : Wed Jul 30 05:13:52 2014
Driver Version                  : 304.33

Attached GPUs                   : 1
GPU 0000:04:00.0
    Product Name                : Tesla C2075
    Display Mode                : Enabled
    Persistence Mode            : Disabled
    Driver Model
        Current                 : N/A
        Pending                 : N/A
    Serial Number               : 0324111084577
    GPU UUID                    : GPU-7ea505ef-ad46-bb24-c440-69da9b300040
    VBIOS Version               : 70.10.46.00.05
    Inforom Version
        Image Version           : N/A
        OEM Object              : 1.1
        ECC Object              : 2.0
        Power Management Object : 4.0
    PCI
        Bus                     : 0x04
        Device                  : 0x00
        Domain                  : 0x0000
        Device Id               : 0x109610DE
        Bus Id                  : 0000:04:00.0
        Sub System Id           : 0x091010DE
        GPU Link Info
            PCIe Generation
                Max             : 2
                Current         : 1
            Link Width
                Max             : 16x
                Current         : 16x
    Fan Speed                   : 30 %
    Performance State           : P12
    Clocks Throttle Reasons     : N/A
    Memory Usage
        Total                   : 5375 MB
        Used                    : 39 MB
        Free                    : 5336 MB
    Compute Mode                : Default
    Utilization
        Gpu                     : 0 %
        Memory                  : 5 %
    Ecc Mode
        Current                 : Enabled
        Pending                 : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 0
            Double Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 0
        Aggregate
            Single Bit            
                Device Memory   : 133276
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 133276
            Double Bit            
                Device Memory   : 203730
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : N/A
                Total           : 203730
    Temperature
        Gpu                     : 58 C
    Power Readings
        Power Management        : Supported
        Power Draw              : 33.83 W
        Power Limit             : 225.00 W
        Default Power Limit     : N/A
        Min Power Limit         : N/A
        Max Power Limit         : N/A
    Clocks
        Graphics                : 50 MHz
        SM                      : 101 MHz
        Memory                  : 135 MHz
    Applications Clocks
        Graphics                : N/A
        Memory                  : N/A
    Max Clocks
        Graphics                : 573 MHz
        SM                      : 1147 MHz
        Memory                  : 1566 MHz
    Compute Processes           : None

Cuda sample

Compiling and executing worked as a super user (tested with cuda/C/0_Simple/simpleMultiGPU ):

# ldconfig /usr/local/cuda-5.0/lib64/
# ./simpleMultiGPU 
[simpleMultiGPU] starting...

CUDA-capable device count: 1
Generating input data...

Computing with 1 GPUs...
  GPU Processing time: 27.814000 (ms)

Computing with Host CPU...

Comparing GPU and Host CPU results...
  GPU sum: 16777296.000000
  CPU sum: 16777294.395033
  Relative difference: 9.566307E-08 

[simpleMultiGPU] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!

When I try this as normal user, I get:

$ ./simpleMultiGPU 
[simpleMultiGPU] starting...

CUDA error at simpleMultiGPU.cu:87 code=38(cudaErrorNoDevice) "cudaGetDeviceCount(&GPU_N)" 
CUDA-capable device count: 0
Generating input data...

Floating point exception

How can I get cuda to work with non-super users?

Testing code

The following code is from " Testing Theano with GPU "

#!/usr/bin/env python
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
else:
    print 'Used the gpu'

The error message

The complete error message is much too long to post it here. A longer version is on http://pastebin.com/eT9vbk7M , but I think the relevant part is:

cc1plus: fatal error: cuda_runtime.h: No such file or directory
compilation terminated.
ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 1, 'for cmd', 'nvcc -shared -g -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=bcb411d72e41f81f3deabfc6926d9728,-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY,-D NPY_ARRAY_ALIGNED=NPY_ALIGNED,-D NPY_ARRAY_WRITEABLE=NPY_WRITEABLE,-D NPY_ARRAY_UPDATE_ALL=NPY_UPDATE_ALL,-D NPY_ARRAY_C_CONTIGUOUS=NPY_C_CONTIGUOUS,-D NPY_ARRAY_F_CONTIGUOUS=NPY_F_CONTIGUOUS,-fPIC -Xlinker -rpath,/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray -Xlinker -rpath,/usr/local/cuda-5.0/lib -Xlinker -rpath,/usr/local/cuda-5.0/lib64 -I/usr/local/lib/python2.7/site-packages/Theano-0.6.0rc1-py2.7.egg/theano/sandbox/cuda -I/usr/local/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include -I/usr/include/python2.7 -o /home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray/cuda_ndarray.so mod.cu -L/usr/local/cuda-5.0/lib -L/usr/local/cuda-5.0/lib64 -L/usr/lib64 -lpython2.7 -lcublas -lcudart')
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available

The standard stream gives:

['nvcc', '-shared', '-g', '-O3', '-m64', '-Xcompiler', '-DCUDA_NDARRAY_CUH=bcb411d72e41f81f3deabfc6926d9728,-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY,-D NPY_ARRAY_ALIGNED=NPY_ALIGNED,-D NPY_ARRAY_WRITEABLE=NPY_WRITEABLE,-D NPY_ARRAY_UPDATE_ALL=NPY_UPDATE_ALL,-D NPY_ARRAY_C_CONTIGUOUS=NPY_C_CONTIGUOUS,-D NPY_ARRAY_F_CONTIGUOUS=NPY_F_CONTIGUOUS,-fPIC', '-Xlinker', '-rpath,/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray', '-Xlinker', '-rpath,/usr/local/cuda-5.0/lib', '-Xlinker', '-rpath,/usr/local/cuda-5.0/lib64', '-I/usr/local/lib/python2.7/site-packages/Theano-0.6.0rc1-py2.7.egg/theano/sandbox/cuda', '-I/usr/local/lib/python2.7/site-packages/numpy-1.6.2-py2.7-linux-x86_64.egg/numpy/core/include', '-I/usr/include/python2.7', '-o', '/home/mthoma/.theano/compiledir_Linux-3.1.10-1.16-desktop-x86_64-with-SuSE-12.1-x86_64-x86_64-2.7.2/cuda_ndarray/cuda_ndarray.so', 'mod.cu', '-L/usr/local/cuda-5.0/lib', '-L/usr/local/cuda-5.0/lib64', '-L/usr/lib64', '-lpython2.7', '-lcublas', '-lcudart']
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.25972604752 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu

theano.rc

$ cat .theanorc 
[global]
device = gpu
floatX = float32
[cuda]
root = /usr/local/cuda-5.0

Answer 1

As some comments told, the problem is the permissio of /dev/nvidia*. As some told this mean during your startup, it don't get initialized correctly. Normally, this is done correctly when the GUI is started. My guess is that you didn't enable it or install it. So you probably have an headless server.

To fix this, just run as root nvidia-smi . This will detect that it isn't started correctly and will fix it. root have the permission to fix things. Normal user don't have the permission to fix this. That is why it work with root (it get automatically fixed), but not as normal user.

This fix need to be done each time the computer boot. To automatise this, you can create as root this file /etc/init.d/nvidia-gpu-config with this content:

#!/bin/sh
#
# nvidia-gpu-config    Start the correct initialization of nvidia GPU driver.
#
# chkconfig: - 90 90
# description:  Init gpu to wanted states

# sudo /sbin/chkconfig --add nvidia-smi
#

case $1 in
'start')
nvidia-smi
;;
esac

Then as root run this command: /sbin/chkconfig --add nvidia-gpu-config .

UPDATE: This work for OS that use the init system SysV. If your system use the init system systemd, I don't know if it work.

Answer 2

尝试将C_INCLUDE_PATH导出到系统上的cuda toolkit包含文件，例如：

export C_INCLUDE_PATH=${C_INCLUDE_PATH}:/usr/local/cuda/include

Why does Theano print “cc1plus: fatal error: cuda_runtime.h: No such file or directory”?

Question

Testing machine

Cuda sample

Testing code

The error message

theano.rc

2 answers

solution1
4 2014-07-31 01:08:46

solution2
0 2014-07-29 08:55:00

Why does Theano print “cc1plus: fatal error: cuda_runtime.h: No such file or directory”?

Question

Testing machine

Cuda sample

Testing code

The error message

theano.rc

2 answers

solution1 4 2014-07-31 01:08:46

solution2 0 2014-07-29 08:55:00

solution1
4 2014-07-31 01:08:46

solution2
0 2014-07-29 08:55:00