简体   繁体   English

TensorFlow-gpu-2.0.0rc2 找不到 cuda-10.1 库并跳过已注册的 GPU 设备

[英]TensorFlow-gpu-2.0.0rc2 cannot find cuda-10.1 libraries and skips registered GPU devices

I am using an NVIDIA Tesla V100-SXM2-32GB on a system where I am a non-admin user, so I cannot change the Cuda version.我在我是非管理员用户的系统上使用NVIDIA Tesla V100-SXM2-32GB ,因此我无法更改 Cuda 版本。 The Cuda version currently installed on the system is 10.1 , and I am trying to get TensorFlow to run with this version.当前安装在系统上的 Cuda 版本是10.1 ,我正在尝试让 TensorFlow 与此版本一起运行。 After installing TensorFlow version 2.0.0rc2 (using cudnn-7.6.4 and cudatoolkit-10.1.243 ), I get the error reported below (within default-enabled eager execution mode).安装 TensorFlow 版本2.0.0rc2 (使用cudnn-7.6.4cudatoolkit-10.1.243 )后,我收到下面报告的错误(在默认启用的急切执行模式下)。 The paths to the Cuda libraries are correctly exported. Cuda 库的路径已正确导出。

According to the official documentation and this post , TensorFlow supports Cuda 10.0 at the moment.根据官方文档这篇文章,TensorFlow 目前支持 Cuda 10.0 Anybody is aware about a version (even alpha) that could run with Cuda 10.1?有人知道可以与 Cuda 10.1 一起运行的版本(甚至是 alpha 版本)吗?

python -c "import tensorflow as tf; tf.zeros(10)"

returns返回

2019-11-10 11:55:36.118647: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-11-10 11:55:39.393230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:1a:00.0
2019-11-10 11:55:39.395456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:1c:00.0
2019-11-10 11:55:39.397553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties: 
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:1d:00.0
2019-11-10 11:55:39.399647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 3 with properties: 
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:1e:00.0
2019-11-10 11:55:39.399986: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.400135: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.400274: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.400414: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.400552: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.400687: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda-8.0/lib64
2019-11-10 11:55:39.405250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-11-10 11:55:39.405367: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-11-10 11:55:39.405848: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-11-10 11:55:39.412764: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100000000 Hz
2019-11-10 11:55:39.412951: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555c0a4adfd0 executing computations on platform Host. Devices:
2019-11-10 11:55:39.413028: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2019-11-10 11:55:40.213011: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555c0a4b0850 executing computations on platform CUDA. Devices:
2019-11-10 11:55:40.213144: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2019-11-10 11:55:40.213208: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): Tesla V100-SXM2-32GB, Compute Capability 7.0
2019-11-10 11:55:40.213262: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (2): Tesla V100-SXM2-32GB, Compute Capability 7.0
2019-11-10 11:55:40.213312: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (3): Tesla V100-SXM2-32GB, Compute Capability 7.0
2019-11-10 11:55:40.213562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-10 11:55:40.213647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      

In case you are using Tensorflow in Windows, you do not need to install all the CUDA and Cudnn drivers by yourself anymore ^_^如果您在 Windows 中使用 Tensorflow,则无需再自行安装所有 CUDA 和 Cudnn 驱动程序^_^

Just use the following commands on conda and it will take care of the respective packages by itself:只需在 conda 上使用以下命令,它就会自行处理相应的包:

Creating a new environment:创建新环境:

conda create -n [name] python=3.6 conda create -n [名称] python=3.6

conda activate [name] conda 激活[名称]

Then, use:然后,使用:

conda install -c conda-forge tensorflow-gpu==1.14 conda install -c conda-forge tensorflow-gpu==1.14

The conda environment will check and install packages as per your system needs. conda 环境将根据您的系统需要检查和安装软件包。 Cheers!干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM