简体   繁体   English

Tensorflow 没有使用 GPU,发现 xla_gpu 不是 gpu

[英]Tensorflow doesn't use GPU, Finds xla_gpu not gpu

I just started to explore AI and never used Tensorflow, even Linux is new to me.我刚开始探索 AI,从未使用过 Tensorflow,甚至 Linux 对我来说都是新的。

I have previously installed NVIDIA Driver 430. It comes with CUDA 10.1我之前安装了 NVIDIA 驱动程序 430。它带有 CUDA 10.1

在此处输入图像描述

Since Tensorflow-gpu 1.14 doesn't support CUDA 10.1, I uninstalled CUDA 10.1 and I downloaded CUDA 10.0由于 Tensorflow-gpu 1.14 不支持 CUDA 10.1,我卸载了 CUDA 10.1 并下载了 ZA33B7755E5F9B504D2D038EACA4FF28D

cuda_10.0.130_410.48_linux.run

once installed I ran安装后我跑了

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 When I tried to use GPU in Jupyter Notebook, the code still doesn't work nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 When I tried to use GPU in Jupyter Notebook, the代码仍然不起作用

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

Error:错误:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1355     try:
-> 1356       return fn(*args)
   1357     except errors.OpError as e:

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1338       # Ensure any changes to the graph are reflected in the runtime.
-> 1339       self._extend_graph()
   1340       return self._call_tf_sessionrun(

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _extend_graph(self)
   1373     with self._graph._session_run_lock():  # pylint: disable=protected-access
-> 1374       tf_session.ExtendSession(self._session)
   1375 

InvalidArgumentError: Cannot assign a device for operation MatMul: {{node MatMul}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
     [[MatMul]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-19-3a5be606bcc9> in <module>
      6 
      7 with tf.Session() as sess:
----> 8     print (sess.run(c))

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    948     try:
    949       result = self._run(None, fetches, feed_dict, options_ptr,
--> 950                          run_metadata_ptr)
    951       if run_metadata:
    952         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1171     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1172       results = self._do_run(handle, final_targets, final_fetches,
-> 1173                              feed_dict_tensor, options, run_metadata)
   1174     else:
   1175       results = []

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1348     if handle is None:
   1349       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1350                            run_metadata)
   1351     else:
   1352       return self._do_call(_prun_fn, handle, feeds, fetches)

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1368           pass
   1369       message = error_interpolation.interpolate(message, self._graph)
-> 1370       raise type(e)(node_def, op, message)
   1371 
   1372   def _extend_graph(self):

InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at <ipython-input-9-b145a02709f7>:5) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
     [[MatMul]]

Errors may have originated from an input operation.
Input Source operations connected to node MatMul:
 b (defined at <ipython-input-9-b145a02709f7>:4)    
 a (defined at <ipython-input-9-b145a02709f7>:3)

But, if I ran this code from Terminal in Python, it works.但是,如果我从 Python 中的终端运行此代码,它就可以工作。 I can see the output我可以看到 output

[[22. [[22。 28.] [49. 28.] [49. 64.]] 64.]]

You need to make sure you have the appropriate CUDA AND CuDNN versions installed.您需要确保安装了适当的CUDACuDNN版本。

  • You can check your CuDNN version with the advice from this link: How to verify CuDNN installation?您可以通过以下链接的建议检查您的CuDNN版本: 如何验证 CuDNN 安装?
    • or by running cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2或通过运行cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2 cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2 on a linux machine cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2在 linux 机器上
  • You can check you CUDA version here: xcat.docs您可以在此处查看CUDA版本: xcat.docs
    • nvcc -V
    • or by running nvidia-smi或通过运行nvidia-smi
  • And read about xla_gpu s here: tensorflow xla and here: github xla_gpu issue并在此处阅读有关xla_gpu的信息: tensorflow xla和此处: github xla_gpu 问题
    • xla was made by tensorflow, and is faster than standard tensorflow. xla 由 tensorflow 制造,比标准 tensorflow 更快。
    • I'm not sure why CUDA without CuDNN calls gpu s xla_gpu s.我不确定为什么没有CUDACuDNN调用gpu s xla_gpu s。 Nvidia gpus need CUDA and CuDNN to work properly with Tensorflow, so it looks like tensorflow is trying to use its own library to compute on the GPU. Nvidia gpus need CUDA and CuDNN to work properly with Tensorflow, so it looks like tensorflow is trying to use its own library to compute on the GPU. But, I'm not really sure.但是,我不确定。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM