为什么 Python 代码未在 GPU 上实现？ TensorFlow-gpu、CUDA、CUDANN 已安装

Question

在 GPU 上执行 python 代码时，我是初学者。 我有一个 CNN 代码，我想在 GPU 上运行它。 我的笔记本电脑上安装了 tensorflow-gpu、CUDA 和 CUDANN，但 Python 代码无法在 GPU 上执行。

我将在这里写下我尝试过的所有内容并发布 output

代码：

 pip freeze | grep tensorflow

Output：

 tensorflow==2.0.0 tensorflow-estimator==2.0.0 tensorflow-gpu==2.0.0

代码：

 nvcc --version

Output：

 nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Fri_Feb__8_19:08:17_PST_2019 Cuda compilation tools, release 10.1, V10.1.105

代码

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

Output

 define CUDNN_MAJOR 7 define CUDNN_MINOR 5 define CUDNN_PATCHLEVEL 0 define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL) include "driver_types.h"

代码：

 from __future__ import absolute_import, division, print_function, unicode_literals import tensorFlow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Output：

 Num GPUs Available: 0

代码

import tensorflow from tensorflow.python.client import device_lib print(device_lib.list_local_devices())

Output：

 2019-10-16 22:11:15.280922: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-10-16 22:11:15.484734: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz 2019-10-16 22:11:15.508127: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45d4c60 executing computations on platform Host. Devices: 2019-10-16 22:11:15.508212: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version 2019-10-16 22:11:15.784006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-16 22:11:15.785226: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45d6ad0 executing computations on platform CUDA. Devices: 2019-10-16 22:11:15.785278: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1060, Compute Capability 6.1 2019-10-16 22:11:15.785605: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-10-16 22:11:15.786528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705 pciBusID: 0000:01:00.0 2019-10-16 22:11:15.786826: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/ 2019-10-16 22:11:15.787053: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/ 2019-10-16 22:11:15.787266: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/ 2019-10-16 22:11:15.787474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/ 2019-10-16 22:11:15.787682: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/ 2019-10-16 22:11:15.787950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/ 2019-10-16 22:11:15.788010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2019-10-16 22:11:15.788036: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2019-10-16 22:11:15.788073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-10-16 22:11:15.788094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-10-16 22:11:15.788111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 7400412130462543104,name: "/device:XLA_CPU:0" device_type: "XLA_CPU" memory_limit: 17179869184 locality { } incarnation: 10419596086097903998 physical_device_desc: "device: XLA_CPU device",name: "/device:XLA_GPU:0" device_type: "XLA_GPU" memory_limit: 17179869184 locality { } incarnation: 10970348491339008844 physical_device_desc: "device: XLA_GPU device" ]

我参考了几个网站，基本上说如果你安装了 GPU 和 tensorflow-gpu，那么程序将自动检测 GPU 并运行代码。 我也知道StackOverflow上有类似的问题，上面的代码是找到类似问题的答案后实现的。 tensorflow 2.0官网

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

Output 是：

RuntimeError: Device placement logging must be set at program startup

为什么我的程序没有在 gpu 上执行？

Answer 1

如果你看这里——

2019-10-16 22:11:15.786826: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787053: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787266: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787682: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/

它说，它正在寻找具有Cuda 10.0的文件，但是它发现的是Cuda 10.1文件。 因此，第一步是卸载并删除 Cuda 10.1 版本并安装 Cuda 10.0。 同时删除 tensorflow，只保留 tensorflow-gpu。 对于所有其他版本，请遵循此处的确切建议。

让我们知道这是否能解决您的问题。

Answer 2

Rishabh Sahrawat 的回答对我有用。 我花了很长时间才弄清楚如何卸载 CUDA 10.1 并安装 CUDA 10.0。 虽然这提供了很多信息，但我仍然在努力确保所有安装正确，因为我遇到了 package 错误（叹气）、NVIDIA 驱动程序错误、dpkg 错误等。我认为将所有内容收集在一个地方并指导其他人会很好（初学者和我一样）可能面临同样的困难。 我尝试了以下命令来修复错误，它对我有用。 其中一些已经在问题中提到过，但我也在这里提到过。 我希望这有帮助。

1.如何卸载CUDA？

dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 sudo dpkg --purge --force-all
sudo apt-get remove cuda-*

2.如何检查CUDA是否被卸载/安装？

命令：

nvcc --version

Output（如果已卸载）

command 'nvcc' not found, but can be installed with sudo apt install nvidia-cuda-toolkit

Output（如果已安装）

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

3.如果出现错误bash: /usr/bin/nvcc: No such file or directory

检查 .bashrc 中的路径。 也可以参考这个链接

4. NVIDIA驱动旧版本如何删除？

命令

sudo apt-get --purge remove "*nvidia*"

5、如何查看驱动是否安装？

命令

nvidia-smi

6. 错误信息“子进程/usr/bin/dpkg返回错误代码(1)”的情况

dpkg 错误

也可以试试：

sudo apt-get install freeglut3 freeglut3-dev libxi-dev libxmu-dev
apt --fix-broken install # (if it doesn't work, try it in root)

7.如何安装CUDA？

我在CUDA 安装中使用了以下命令而不是步骤 4

sudo apt-get install cuda-10-0

8. 如何安装 CUDANN？

下载 Linux 的 cuDNN 库

# Unpack the archive

tar -zxvf cudnn-10.0-linux-x64-v7.6.4.38.tgz

# Move the unpacked contents to your CUDA directory

sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64/
sudo cp  cuda/include/cudnn.h /usr/local/cuda-10.0/include/

# Give read access to all users

sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

也可以使用以下链接（它对我不起作用，但值得一试）：

我最终按照链接中的步骤安装了 CUDA 10.1。
我无法按照此链接中的建议创建新文件 /etc/profile.d/cuda.sh
这个链接也不错。

安装完所有内容并卸载 tensorflow（只需保留 tensorflow-gpu），代码将在 GPU 上运行

如何确保 tensorflow 使用的是 GPU

注意：如果您在导入 tensorflow 时遇到导入错误，我这样做了，它对我有用

pip uninstall tensorflow
pip uninstall tensorflow-gpu

pip install tensorflow-gpu

附加信息：

1. 检查 Ubuntu kernel 版本：

uname -sr
uname -r
uname -a

2. 安装GCC

享受：）

Answer 3

如果上述任何方法都不起作用，请尝试使用conda而不是pip安装tensorflow-gpu 。 由于某种原因pip install tensorflow-gpu无法按预期工作。

conda install tensorflow-gpu

为什么 Python 代码未在 GPU 上实现？ TensorFlow-gpu、CUDA、CUDANN 已安装

问题描述

3 个解决方案

解决方案1
3 已采纳 2019-10-17 09:22:17

解决方案2
0 2019-10-23 12:21:01

解决方案3
0 2020-07-07 08:21:16

为什么 Python 代码未在 GPU 上实现？ TensorFlow-gpu、CUDA、CUDANN 已安装

问题描述

3 个解决方案

解决方案1 3 已采纳 2019-10-17 09:22:17

解决方案2 0 2019-10-23 12:21:01

解决方案3 0 2020-07-07 08:21:16

解决方案1
3 已采纳 2019-10-17 09:22:17

解决方案2
0 2019-10-23 12:21:01

解决方案3
0 2020-07-07 08:21:16