简体   繁体   English

Tensorflow 未在 GPU 上运行

[英]Tensorflow not running on GPU

I have aldready spent a considerable of time digging around on stack overflow and else looking for the answer, but couldn't find anything我已经花了相当多的时间在堆栈溢出上挖掘并寻找答案,但找不到任何东西

Hi all,大家好,

I am running Tensorflow with Keras on top.我正在运行 Tensorflow,顶部是 Keras。 I am 90% sure I installed Tensorflow GPU, is there any way to check which install I did?我 90% 确定我安装了 Tensorflow GPU,有什么方法可以检查我安装了哪个?

I was trying to do run some CNN models from Jupyter notebook and I noticed that Keras was running the model on the CPU (checked task manager, CPU was at 100%).我试图从 Jupyter notebook 运行一些 CNN 模型,我注意到 Keras 在 CPU 上运行 model(检查任务管理器,CPU 为 100%)。

I tried running this code from the tensorflow website:我尝试从 tensorflow 网站运行这段代码:

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

And this is what I got:这就是我得到的:

MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0
2017-06-29 17:09:38.783183: I c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] MatMul: (MatMul)/job:localhost/replica:0/task:0/cpu:0
b: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-06-29 17:09:38.784779: I c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] b: (Const)/job:localhost/replica:0/task:0/cpu:0
a: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-06-29 17:09:38.786128: I c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] a: (Const)/job:localhost/replica:0/task:0/cpu:0
[[ 22.  28.]
 [ 49.  64.]]

Which to me shows I am running on my CPU, for some reason.出于某种原因,这对我来说表明我正在我的 CPU 上运行。

I have a GTX1050 (driver version 382.53), I installed CUDA, and Cudnn, and tensorflow installed without any problems.我有一个GTX1050(驱动版本382.53),我安装了CUDA和Cudnn,tensorflow安装没有任何问题。 I installed Visual Studio 2015 as well since it was listed as a compatible version.我也安装了 Visual Studio 2015,因为它被列为兼容版本。

I remember CUDA mentioning something about an incompatible driver being installed, but if I recall correctly CUDA should have installed its own driver.我记得 CUDA 提到安装了不兼容的驱动程序,但如果我没记错的话,CUDA 应该安装了自己的驱动程序。

Edit: I ran theses commands to list the available devices编辑:我运行了这些命令来列出可用的设备

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

and this is what I get这就是我得到的

[name: "/cpu:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 14922788031522107450
]

and a whole lot of warnings like this还有很多这样的警告

2017-06-29 17:32:45.401429: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.

Edit 2编辑 2

Tried running尝试运行

pip3 install --upgrade tensorflow-gpu

and I get我得到

Requirement already up-to-date: tensorflow-gpu in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages
Requirement already up-to-date: markdown==2.2.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: html5lib==0.9999999 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: werkzeug>=0.11.10 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: wheel>=0.26 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: bleach==1.5.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: six>=1.10.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: protobuf>=3.2.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: backports.weakref==1.0rc1 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: numpy>=1.11.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: setuptools in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from protobuf>=3.2.0->tensorflow-gpu)

Solved: Check comments for solution.已解决:查看评论以获取解决方案。 Thanks to all who helped!感谢所有帮助过的人!

I am new to this, so any help is greatly appreciated.我对此很陌生,因此非常感谢您的帮助。 Thank you.谢谢。

To check which devices are available to TensorFlow you can use this and see if the GPU cards are available:要检查哪些设备可用于 TensorFlow,您可以使用它并查看 GPU 卡是否可用:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

Edit Also, you should see this kind of logs if you use TensorFlow Cuda version :编辑此外,如果您使用 TensorFlow Cuda 版本,您应该会看到这种日志:

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.*.* locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.*.*  locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.*.*  locally

It may sound dumb, but try reboot.听起来可能很愚蠢,但请尝试重新启动。 It helped me and some other folks in GitHub.它帮助了我和 GitHub 上的其他一些人。

I was still having trouble getting GPU support even after correctly installing tensorflow-gpu via pip.即使通过 pip 正确安装了 tensorflow-gpu,我仍然无法获得 GPU 支持。 My problem was that I had installed tensorflow 1.5, and CUDA 9.1 (the default version Nvidia directs you to), whereas the precompiled tensorflow 1.5 works with CUDA versions <= 9.0.我的问题是我已经安装了 tensorflow 1.5 和 CUDA 9.1(Nvidia 引导您使用的默认版本),而预编译的 tensorflow 1.5 可与 CUDA 版本 <= 9.0 一起使用。 Here is download page on nvidia's site to get the correct CUDA 9.0:这是 nvidia 网站上的下载页面,以获得正确的 CUDA 9.0:

https://developer.nvidia.com/cuda-90-download-archive https://developer.nvidia.com/cuda-90-download-archive

Also make sure to update your cuDNN to a version compatible with CUDA 9.0 https://developer.nvidia.com/cudnn https://developer.nvidia.com/rdp/cudnn-download还要确保将您的 cuDNN 更新到与 CUDA 9.0 兼容的版本https://developer.nvidia.com/cudnn https://developer.nvidia.com/rdp/cudnn-download

For me the following worked.对我来说,以下工作有效。

I used conda environment, as python environment meant setting LD_LIBRARY_PATH and installing Cuda manually which is an another mess.我使用了 conda 环境,因为 python 环境意味着设置LD_LIBRARY_PATH并手动安装 Cuda,这又是一团糟。

In the mentioned blog , he have installed cudatoolkit and cudann inside conda and then installed tensorflow-gpu later which fixed the problem.在提到的博客中,他在cudatoolkit安装了cudatoolkitcudann ,然后安装了tensorflow-gpu解决了问题。

PS, as far as I read, cudatoolkit and cudann plays huge role in getting your code running on tensorflow-gpu. PS,据我cudatoolkit and cudanncudatoolkit and cudann在让你的代码在 tensorflow-gpu 上运行方面发挥着巨大的作用。

If you happen to using Anaconda to manage your environments => uninstall all existing versions of tensorflow如果您碰巧使用 Anaconda 来管理您的环境 => 卸载所有现有版本的 tensorflow

pip uninstall tensorflow
pip3 uninstall tensorflow

Install tensorflow-gpu using conda使用 conda 安装 tensorflow-gpu

conda install tensorflow-gpu

If you don't mind starting from a new environment tho the easiest way to do so without如果您不介意从新环境开始,那么最简单的方法是

conda create --name tf_gpu tensorflow-gpu 

creates a new conda environment with the name tf_gpu with tensorflow gpu installed创建一个名为 tf_gpu 并安装了 tensorflow gpu 的新 conda 环境

I ran into a similar problem I had the follwing versions of tensor flow libraries.我遇到了类似的问题,我有以下版本的张量流库。

tensorboard               2.4.1              pyhd8ed1ab_1    conda-forge
tensorboard-plugin-wit    1.8.0              pyh44b312d_0    conda-forge
tensorflow                2.4.1            py39hf3d152e_0    conda-forge
tensorflow-base           2.4.1            py39h23a8cbf_0    conda-forge
tensorflow-estimator      2.4.0              pyh9656e83_0    conda-forge
tensorflow-gpu            2.4.1                h30adc30_0

The same version of libraries were installed in another machine where it was able to utilise the GPU.相同版本的库安装在另一台能够使用 GPU 的机器上。 The Cuda toolkit version and driver versions were the same in both machines( the machine where it was working and the one where it wasnt). Cuda 工具包版本和驱动程序版本在两台机器(工作的机器和不工作的机器)中是相同的。

Turns out the reason was that tensorflow-gpu=2.4.1 is compatible with python version 3.8.10.原来原因是 tensorflow-gpu=2.4.1 与 python 版本 3.8.10 兼容。 Changing my python version to 3.8.10 and keeping all other things unchanged worked for me !将我的 python 版本更改为 3.8.10 并保持所有其他内容不变对我有用!

If you have problems with running tensorflow on gpu you should check if you have good versions of cuda and cuDNN installed.如果您在 gpu 上运行 tensorflow 时遇到问题,您应该检查是否安装了正确版本的 cuda 和 cuDNN。 The versions should be exactly the same as here .版本应与此处完全相同。 For example for tensorflow v2.8.0 you should have cuda v11.2 (not newer) and cuDNN v8.1 .例如,对于tensorflow v2.8.0 ,您应该有cuda v11.2 (not newer)cuDNN v8.1

Also, you should add cuda /bin folder and /libnvvp to path (for windows).此外,您应该将 cuda /bin文件夹和/libnvvp到路径(对于 Windows)。

This answer is based on this tutorial Tensorflow 2021 install tutorial .此答案基于本教程Tensorflow 2021 安装教程 If you still can not make it runing check for some missing steps.如果您仍然无法运行它,请检查是否缺少一些步骤。

You may also have CUDA versions mismatch than needs to be solved one way or the other (downgrading / pinning tensorflow to the latest version supported by your system CUDA is arguably quicker, but only doing the opposite is future-proof).您可能还遇到了CUDA 版本不匹配的情况,需要以一种或另一种方式解决(将tensorflow降级/固定到您的系统支持的最新版本 CUDA 可以说更快,但只有反其道而行之才是面向未来的)。

To verify, check CUDA versions used in your installed Tensorflow package:要验证,请检查安装的 Tensorflow package 中使用的 CUDA 版本:

>>> import tensorflow as tf
>>> tf.sysconfig.get_build_info()['cuda_version']
'11.8'

... and compare it with the CUDA version installed on the host / in the container / VM: ...并将其与安装在主机/容器/VM 中的 CUDA 版本进行比较:

>>> import os
>>> os.system("nvcc --version")

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
0

More info更多信息

When tensorflow imports cleanly (without any warnings), but it detects only CPU on a GPU-equipped machine with CUDA libraries installed, then you may also have a CUDA versions mismatch between the pre-compiled tensorflow package wheel and the system / container-installed versions.tensorflow干净地导入时(没有任何警告),但它仅在安装了 CUDA 库的配备 GPU 的机器上检测到 CPU,那么预编译的tensorflow package 轮和系统/容器之间可能存在 CUDA 版本不匹配-安装版本。

The above CUDA versions mismatch (v11.8 used during Tensorflow compilation vs. v11.2 CUDA compiler installed in the container) resulted in TF without GPU access, despite nvidia-smi loading correctly).上面的 CUDA 版本不匹配(v11.8 在 Tensorflow 编译期间使用 v11.2 CUDA 编译器安装在容器中)导致 TF 没有 GPU 访问权限,尽管nvidia-smi加载正确)。

See also: Tensorflow CUDA compatibility table (tested build configurations):另请参阅:Tensorflow CUDA 兼容性表(测试构建配置):

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM