[英]Tensorflow GPU / CUDA installation on Ubuntu
I have set up a Ubuntu 18.04 and tried to make Tensorflow 2.2 GPU work (I have an Nvidia/CUDA graphic card) with Python.我已经设置了一个 Ubuntu 18.04 并尝试使用 Python 使 Tensorflow 2.2 GPU 工作(我有一个 Nvidia/CUDA 显卡)。 Even after reading the documentation https://www.tensorflow.org/install/gpu#linux_setup , it failed (see below for details about how it failed).
即使在阅读文档https://www.tensorflow.org/install/gpu#linux_setup 之后,它也失败了(有关它如何失败的详细信息,请参见下文)。
Question: would you have a canonical "todo" list (starting point: freshly installed Ubuntu server) on how to install tensorflow-gpu
and make it work, with a few steps?问题:您是否有一个规范的“待办事项”列表(起点:新安装的 Ubuntu 服务器)关于如何通过几个步骤安装
tensorflow-gpu
并使其工作?
Notes:笔记:
I have read many similar forum posts, and I think that having a canonical "todo" (from a fresh Ubuntu install to having tensorflow-gpu
working) would be interesting, with a few steps/bash commands我读过很多类似的论坛帖子,我认为有一个规范的“todo”(从全新的 Ubuntu 安装到让
tensorflow-gpu
工作)会很有趣,只需几个步骤/bash 命令
the documentation I used involved我使用的文档涉及
export LD_LIBRARY_PATH... # Add NVIDIA package repository sudo apt-key adv --fetch-keys http://developer.download... ... # Install CUDA and tools. Include optional NCCL 2.x sudo apt install cuda9.0 cuda...
Even after a lot of trial and errors (I don't copy/paste all the different errors here, would be too long), then at the end:即使经过大量的试验和错误(我不会在这里复制/粘贴所有不同的错误,会太长),然后在最后:
import tensorflow
always failed.总是失败。 Some reasons included `ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory.
一些原因包括`ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory。 I have already read the relevant question here , or this very long (!) Github issue .
我已经在这里阅读了相关问题,或者这个很长的 (!) Github 问题。
After some trial and error, import tensorflow
works, but it doesn't use the GPU (see also Tensorflow not running on GPU ).经过一些反复试验,
import tensorflow
可以工作,但它不使用 GPU(另请参阅Tensorflow not running on GPU )。
Well, I was facing the same problem.好吧,我面临着同样的问题。 The first thing to do is to look up, which Tensorflow version is required.
首先要做的是查找,需要哪个Tensorflow版本。 In your case
Tensorflow 2.2
.在你的情况下
Tensorflow 2.2
。 requires CUDA 10.1
.需要
CUDA 10.1
。 The correct cuDNN version is also important.正确的 cuDNN 版本也很重要。 In your case it would be
cuDNN 7.4
.在您的情况下,它将是
cuDNN 7.4
。 An additional point is the installed python version.另外一点是安装的python版本。 I would recommend
Python 3.5-3.8
.我会推荐
Python 3.5-3.8
。 If one those mismatch, a fully compatibility is almost impossible.如果其中一个不匹配,则完全兼容几乎是不可能的。
So if you want a check list, here you go:因此,如果您想要一份检查清单,请访问:
You can find the compatibility check list of Tensorflow and CUDA here你可以在这里找到 Tensorflow 和 CUDA 的兼容性检查列表
You can find the CUDA Toolkit here您可以在此处找到 CUDA 工具包
Finally get cuDNN in the correct version here最终在此处获得正确版本的 cuDNN
That's all.就这样。
I faced the problem as well when using the Google Cloud Platform for two projects involving deep learning. 在将Google Cloud Platform用于涉及深度学习的两个项目时,我也遇到了问题。 They provide servers with nothing but a freshly installed Ubuntu OS.
他们为服务器提供的只是全新安装的Ubuntu OS。 Regarding my experience, I recommend doing the following steps:
根据我的经验,我建议执行以下步骤:
This should work. 这应该工作。 Your problem is probably that you are using a more recent cuda version than targeted by the current Tensorflow release.
您的问题可能是您使用的是最新的cuda版本,而不是当前Tensorflow版本所针对的版本。
To install tensorflow-gpu, the guidelines which are provided on official website are very tedious for beginers, instead we can do these simple steps: 要安装tensorflow-gpu,官方网站上提供的指南对于初学者来说非常繁琐,相反,我们可以执行以下简单步骤:
Note : NVIDIA driver must be installed before this(you can verify this using command nvidia-smi). 注意:在此之前必须先安装NVIDIA驱动程序(您可以使用命令nvidia-smi进行验证)。
With the given code 用给定的代码
import tensorflow as tf
if tf.test.gpu_device_name():
print('Default GPU Device{}'.format(tf.test.gpu_device_name()))
else:
print("not using gpu")
You can find the tutorial on link given below https://www.pugetsystems.com/labs/hpc/Install-TensorFlow-with-GPU-Support-the-Easy-Way-on-Ubuntu-18-04-without-installing-CUDA-1170/ ? 您可以在下面提供的链接上找到该教程https://www.pugetsystems.com/labs/hpc/Install-TensorFlow-with-GPU-Support-the-Easy-Way-on-Ubuntu-18-04-without-installing -CUDA-1170 / ?
I would suggest to first check the availability of GPU using nvidia-smi
command.我建议首先使用
nvidia-smi
命令检查 GPU 的可用性。
I had faced the same issue, i was able to resolve it by using docker container, you can install docker using Install Docker Engine on Ubuntu or use the Digital Ocean guide (i used this one) How To Install and Use Docker on Ubuntu 18.04我遇到了同样的问题,我能够通过使用 docker 容器来解决它,您可以在 Ubuntu 上使用Install Docker Engine 安装 docker或使用 Digital Ocean 指南(我使用过这个) How To Install and Use Docker on Ubuntu 18.04
After that it is simple just run the following command based on the requirements之后就很简单了,根据需求运行以下命令即可
NV_GPU='0' nvidia-docker run --runtime=nvidia -it -v /path/to/folder:/path/to/folder/for/docker/container nvcr.io/nvidia/tensorflow:17.11
NV_GPU='0' nvidia-docker run --runtime=nvidia -it -v /storage/research/:/storage/research/ nvcr.io/nvidia/tensorflow:20.12-tf2-py3
Here '0' represents the GPU number, if you want to use more than one GPU just use '0,1,2' and so on ....这里'0'代表GPU编号,如果你想使用多个GPU就使用'0,1,2'等等......
Hope this solves the issue.希望这能解决问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.