Tensorflow 没有使用 GPU，发现 xla_gpu 不是 gpu

Question

我刚开始探索 AI，从未使用过 Tensorflow，甚至 Linux 对我来说都是新的。

我之前安装了 NVIDIA 驱动程序 430。它带有 CUDA 10.1

由于 Tensorflow-gpu 1.14 不支持 CUDA 10.1，我卸载了 CUDA 10.1 并下载了 ZA33B7755E5F9B504D2D038EACA4FF28D

cuda_10.0.130_410.48_linux.run

安装后我跑了

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 When I tried to use GPU in Jupyter Notebook, the代码仍然不起作用

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

错误：

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1355     try:
-> 1356       return fn(*args)
   1357     except errors.OpError as e:

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1338       # Ensure any changes to the graph are reflected in the runtime.
-> 1339       self._extend_graph()
   1340       return self._call_tf_sessionrun(

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _extend_graph(self)
   1373     with self._graph._session_run_lock():  # pylint: disable=protected-access
-> 1374       tf_session.ExtendSession(self._session)
   1375 

InvalidArgumentError: Cannot assign a device for operation MatMul: {{node MatMul}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
     [[MatMul]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-19-3a5be606bcc9> in <module>
      6 
      7 with tf.Session() as sess:
----> 8     print (sess.run(c))

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    948     try:
    949       result = self._run(None, fetches, feed_dict, options_ptr,
--> 950                          run_metadata_ptr)
    951       if run_metadata:
    952         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1171     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1172       results = self._do_run(handle, final_targets, final_fetches,
-> 1173                              feed_dict_tensor, options, run_metadata)
   1174     else:
   1175       results = []

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1348     if handle is None:
   1349       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1350                            run_metadata)
   1351     else:
   1352       return self._do_call(_prun_fn, handle, feeds, fetches)

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1368           pass
   1369       message = error_interpolation.interpolate(message, self._graph)
-> 1370       raise type(e)(node_def, op, message)
   1371 
   1372   def _extend_graph(self):

InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at <ipython-input-9-b145a02709f7>:5) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
     [[MatMul]]

Errors may have originated from an input operation.
Input Source operations connected to node MatMul:
 b (defined at <ipython-input-9-b145a02709f7>:4)    
 a (defined at <ipython-input-9-b145a02709f7>:3)

但是，如果我从 Python 中的终端运行此代码，它就可以工作。 我可以看到 output

[[22。 28.] [49. 64.]]

Answer 1

您需要确保安装了适当的CUDA和CuDNN版本。

您可以通过以下链接的建议检查您的CuDNN版本：如何验证 CuDNN 安装？
- 或通过运行cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2 cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2在 linux 机器上
您可以在此处查看CUDA版本： xcat.docs
- nvcc -V
- 或通过运行nvidia-smi
并在此处阅读有关xla_gpu的信息： tensorflow xla和此处： github xla_gpu 问题
- xla 由 tensorflow 制造，比标准 tensorflow 更快。
- 我不确定为什么没有CUDA的CuDNN调用gpu s xla_gpu s。 Nvidia gpus need CUDA and CuDNN to work properly with Tensorflow, so it looks like tensorflow is trying to use its own library to compute on the GPU. 但是，我不确定。

Tensorflow 没有使用 GPU，发现 xla_gpu 不是 gpu

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-10-02 15:36:34

Tensorflow 没有使用 GPU，发现 xla_gpu 不是 gpu

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-10-02 15:36:34

解决方案1
1 已采纳 2019-10-02 15:36:34