简体   繁体   中英

TensorFlow-gpu-2.0.0rc2 cannot find cuda-10.1 libraries and skips registered GPU devices

I am using an NVIDIA Tesla V100-SXM2-32GB on a system where I am a non-admin user, so I cannot change the Cuda version. The Cuda version currently installed on the system is 10.1 , and I am trying to get TensorFlow to run with this version. After installing TensorFlow version 2.0.0rc2 (using cudnn-7.6.4 and cudatoolkit-10.1.243 ), I get the error reported below (within default-enabled eager execution mode). The paths to the Cuda libraries are correctly exported.

According to the official documentation and this post , TensorFlow supports Cuda 10.0 at the moment. Anybody is aware about a version (even alpha) that could run with Cuda 10.1?

python -c "import tensorflow as tf; tf.zeros(10)"

returns

2019-11-10 11:55:36.118647: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-11-10 11:55:39.393230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:1a:00.0
2019-11-10 11:55:39.395456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:1c:00.0
2019-11-10 11:55:39.397553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties: 
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:1d:00.0
2019-11-10 11:55:39.399647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 3 with properties: 
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:1e:00.0
2019-11-10 11:55:39.399986: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.400135: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.400274: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.400414: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.400552: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.400687: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.405250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-11-10 11:55:39.405367: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-11-10 11:55:39.405848: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-11-10 11:55:39.412764: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100000000 Hz
2019-11-10 11:55:39.412951: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555c0a4adfd0 executing computations on platform Host. Devices:
2019-11-10 11:55:39.413028: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2019-11-10 11:55:40.213011: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555c0a4b0850 executing computations on platform CUDA. Devices:
2019-11-10 11:55:40.213144: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2019-11-10 11:55:40.213208: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): Tesla V100-SXM2-32GB, Compute Capability 7.0
2019-11-10 11:55:40.213262: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (2): Tesla V100-SXM2-32GB, Compute Capability 7.0
2019-11-10 11:55:40.213312: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (3): Tesla V100-SXM2-32GB, Compute Capability 7.0
2019-11-10 11:55:40.213562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-10 11:55:40.213647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      

In case you are using Tensorflow in Windows, you do not need to install all the CUDA and Cudnn drivers by yourself anymore ^_^

Just use the following commands on conda and it will take care of the respective packages by itself:

Creating a new environment:

conda create -n [name] python=3.6

conda activate [name]

Then, use:

conda install -c conda-forge tensorflow-gpu==1.14

The conda environment will check and install packages as per your system needs. Cheers!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM