[英]Tensorflow doesn't use GPU, Finds xla_gpu not gpu
I just started to explore AI and never used Tensorflow, even Linux is new to me.我刚开始探索 AI,从未使用过 Tensorflow,甚至 Linux 对我来说都是新的。
I have previously installed NVIDIA Driver 430. It comes with CUDA 10.1我之前安装了 NVIDIA 驱动程序 430。它带有 CUDA 10.1
Since Tensorflow-gpu 1.14 doesn't support CUDA 10.1, I uninstalled CUDA 10.1 and I downloaded CUDA 10.0由于 Tensorflow-gpu 1.14 不支持 CUDA 10.1,我卸载了 CUDA 10.1 并下载了 ZA33B7755E5F9B504D2D038EACA4FF28D
cuda_10.0.130_410.48_linux.run
once installed I ran安装后我跑了
nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
When I tried to use GPU in Jupyter Notebook, the code still doesn't work nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
When I tried to use GPU in Jupyter Notebook, the代码仍然不起作用
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
with tf.Session() as sess:
print (sess.run(c))
Error:错误:
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1355 try:
-> 1356 return fn(*args)
1357 except errors.OpError as e:
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1338 # Ensure any changes to the graph are reflected in the runtime.
-> 1339 self._extend_graph()
1340 return self._call_tf_sessionrun(
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _extend_graph(self)
1373 with self._graph._session_run_lock(): # pylint: disable=protected-access
-> 1374 tf_session.ExtendSession(self._session)
1375
InvalidArgumentError: Cannot assign a device for operation MatMul: {{node MatMul}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
[[MatMul]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-19-3a5be606bcc9> in <module>
6
7 with tf.Session() as sess:
----> 8 print (sess.run(c))
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
948 try:
949 result = self._run(None, fetches, feed_dict, options_ptr,
--> 950 run_metadata_ptr)
951 if run_metadata:
952 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1171 if final_fetches or final_targets or (handle and feed_dict_tensor):
1172 results = self._do_run(handle, final_targets, final_fetches,
-> 1173 feed_dict_tensor, options, run_metadata)
1174 else:
1175 results = []
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1348 if handle is None:
1349 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1350 run_metadata)
1351 else:
1352 return self._do_call(_prun_fn, handle, feeds, fetches)
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1368 pass
1369 message = error_interpolation.interpolate(message, self._graph)
-> 1370 raise type(e)(node_def, op, message)
1371
1372 def _extend_graph(self):
InvalidArgumentError: Cannot assign a device for operation MatMul: node MatMul (defined at <ipython-input-9-b145a02709f7>:5) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
[[MatMul]]
Errors may have originated from an input operation.
Input Source operations connected to node MatMul:
b (defined at <ipython-input-9-b145a02709f7>:4)
a (defined at <ipython-input-9-b145a02709f7>:3)
But, if I ran this code from Terminal in Python, it works.但是,如果我从 Python 中的终端运行此代码,它就可以工作。 I can see the output
我可以看到 output
[[22.
[[22。 28.] [49.
28.] [49. 64.]]
64.]]
You need to make sure you have the appropriate CUDA
AND CuDNN
versions installed.您需要确保安装了适当的
CUDA
和CuDNN
版本。
CuDNN
version with the advice from this link: How to verify CuDNN installation?CuDNN
版本: 如何验证 CuDNN 安装?
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
on a linux machine cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
在 linux 机器上CUDA
version here: xcat.docsCUDA
版本: xcat.docs
nvcc -V
nvidia-smi
nvidia-smi
xla_gpu
s here: tensorflow xla and here: github xla_gpu issuexla_gpu
的信息: tensorflow xla和此处: github xla_gpu 问题
CUDA
without CuDNN
calls gpu
s xla_gpu
s.CUDA
的CuDNN
调用gpu
s xla_gpu
s。 Nvidia gpus need CUDA and CuDNN to work properly with Tensorflow, so it looks like tensorflow is trying to use its own library to compute on the GPU.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.