简体   繁体   English

已安装 CUDA 10.1 但 Tensorflow 不在 GPU 上运行模拟

[英]CUDA 10.1 installed but Tensorflow doesn't run simulation on GPU

CUDA 10.1 and the NVidia drivers v440 are installed on my Ubuntu 18.04 system. CUDA 10.1 和 NVidia 驱动程序 v440 安装在我的 Ubuntu 18.04 系统上。 I don't understand why the nvidia-smi tool reports CUDA version 10.2 when the installed version is 10.1 (see further down).我不明白为什么nvidia-smi工具在安装的版本为 10.1 时报告 CUDA 版本为 10.2(请参阅下文)。

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M1200        On   | 00000000:01:00.0  On |                  N/A |
| N/A   45C    P0    N/A /  N/A |    962MiB /  4042MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1346      G   /usr/lib/xorg/Xorg                           107MiB |
|    0      1647      G   /usr/bin/gnome-shell                          57MiB |
|    0      2521      G   /usr/lib/xorg/Xorg                           414MiB |
|    0      2655      G   /usr/bin/gnome-shell                         206MiB |
|    0      3549      C   python                                        26MiB |
|    0      4236      G   ...quest-channel-token=1063048282371062146   139MiB |
+-----------------------------------------------------------------------------+

Whenever I try to run a Tensorflow (Python) program it seems to correctly detect the GPU on my laptop but produces a number of errors during initialization, and does not run the simulation on the GPU as can be attested by the GPU usage shown above.每当我尝试运行 Tensorflow (Python) 程序时,它似乎都能正确检测到我笔记本电脑上的 GPU,但在初始化过程中会产生许多错误,并且不会在 GPU 上运行模拟,这可以通过上面显示的 GPU 使用情况来证明。

2020-02-13 17:37:53.162545: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-13 17:37:53.167709: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-02-13 17:37:53.215323: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 17:37:53.215893: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56196a0c1980 executing computations on platform CUDA. Devices:
2020-02-13 17:37:53.215913: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Quadro M1200, Compute Capability 5.0
2020-02-13 17:37:53.235780: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz
2020-02-13 17:37:53.236381: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56196c491c70 executing computations on platform Host. Devices:
2020-02-13 17:37:53.236413: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2020-02-13 17:37:53.236721: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 17:37:53.237160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Quadro M1200 major: 5 minor: 0 memoryClockRate(GHz): 1.148
pciBusID: 0000:01:00.0
2020-02-13 17:37:53.237367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237508: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237645: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237811: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237948: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.238083: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.243683: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-02-13 17:37:53.243719: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-02-13 17:37:53.243745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-13 17:37:53.243760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2020-02-13 17:37:53.243772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2020-02-13 17:37:53.273148: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:From /home/xxxxxxx/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Some facts about the system and packages installed:关于系统和安装的软件包的一些事实:

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:    18.04
Codename:   bionic

# dpkg --get-selections |grep -i cuda
cuda                        install
cuda-10-1                   install
cuda-command-line-tools-10-1            install
cuda-compiler-10-1              install
cuda-cudart-10-1                install
cuda-cudart-dev-10-1                install
cuda-cufft-10-1                 install
cuda-cufft-dev-10-1             install
cuda-cuobjdump-10-1             install
cuda-cupti-10-1                 install
cuda-curand-10-1                install
cuda-curand-dev-10-1                install
cuda-cusolver-10-1              install
cuda-cusolver-dev-10-1              install
cuda-cusparse-10-1              install
cuda-cusparse-dev-10-1              install
cuda-demo-suite-10-1                install
cuda-documentation-10-1             install
cuda-driver-dev-10-1                install
cuda-drivers                    install
cuda-gdb-10-1                   install
cuda-gpu-library-advisor-10-1           install
cuda-libraries-10-1             install
cuda-libraries-dev-10-1             install
cuda-license-10-1               install
cuda-license-10-2               install
cuda-memcheck-10-1              install
cuda-misc-headers-10-1              install
cuda-npp-10-1                   install
cuda-npp-dev-10-1               install
cuda-nsight-10-1                install
cuda-nsight-compute-10-1            install
cuda-nsight-systems-10-1            install
cuda-nvcc-10-1                  install
cuda-nvdisasm-10-1              install
cuda-nvgraph-10-1               install
cuda-nvgraph-dev-10-1               install
cuda-nvjpeg-10-1                install
cuda-nvjpeg-dev-10-1                install
cuda-nvml-dev-10-1              install
cuda-nvprof-10-1                install
cuda-nvprune-10-1               install
cuda-nvrtc-10-1                 install
cuda-nvrtc-dev-10-1             install
cuda-nvtx-10-1                  install
cuda-nvvp-10-1                  install
cuda-repo-ubuntu1804                install
cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01   deinstall
cuda-runtime-10-1               install
cuda-samples-10-1               install
cuda-sanitizer-api-10-1             install
cuda-toolkit-10-1               install
cuda-tools-10-1                 install
cuda-visual-tools-10-1              install


# dpkg --get-selections |grep -P 'nvidia-[^\s]+\s+install$'
libnvidia-cfg1-440:amd64            install
libnvidia-common-435                install
libnvidia-common-440                install
libnvidia-compute-440:amd64         install
libnvidia-decode-440:amd64          install
libnvidia-encode-440:amd64          install
libnvidia-fbc1-440:amd64            install
libnvidia-gl-440:amd64              install
libnvidia-ifr1-440:amd64            install
nvidia-compute-utils-440            install
nvidia-dkms-440                 install
nvidia-driver-440               install
nvidia-kernel-common-440            install
nvidia-kernel-source-440            install
nvidia-machine-learning-repo-ubuntu1804     install
nvidia-modprobe                 install
nvidia-prime                    install
nvidia-settings                 install
nvidia-utils-440                install
xserver-xorg-video-nvidia-440           install
$ pip list|grep -i tensorflow
tensorflow-estimator (1.14.0)
tensorflow-gpu (1.14.0)

Is there anything else I need to do for Python Tensorflow simulations to run on the GPU?为了在 GPU 上运行 Python Tensorflow 模拟,我还需要做些什么吗? How can I diagnose this?我该如何诊断?

From Could not dlopen library 'libcudart.so.10.0';Could not dlopen library 'libcudart.so.10.0'; we can get that you tensorflow package is built against CUDA 10.0.我们可以知道您的 tensorflow 包是针对 CUDA 10.0 构建的。 You should install CUDA 10.0 or build it from source (against CUDA 10.1 or 10.2) by yourself.您应该安装 CUDA 10.0 或自己从源代码(针对 CUDA 10.1 或 10.2)构建它。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Tensorflow 2.3.0 CUDA 工具包版本 10.1 不使用 GPU - Tensorflow 2.3.0 CUDA Toolkit version 10.1 does not use GPU tensorflow-gpu 2.1.0 和 CUDA 10.1 看不到我的 GPU - My GPUs are not visible with tensorflow-gpu 2.1.0 and CUDA 10.1 TensorFlow-gpu-2.0.0rc2 找不到 cuda-10.1 库并跳过已注册的 GPU 设备 - TensorFlow-gpu-2.0.0rc2 cannot find cuda-10.1 libraries and skips registered GPU devices 导入 tensorflow 时,Windows10 上带有 rtx 2070 gpu 的 Cuda 10.1 不起作用 - Cuda 10.1 with rtx 2070 gpu on windows10 does not work when import tensorflow tensorflow-gpu 2.2 适用于 CUDA 10.2 但需要 cuDNN 7.6.4,它在 NVIDIA 存档中没有 CUDA 10.2 的下载文件 - tensorflow-gpu 2.2 works with CUDA 10.2 but requires cuDNN 7.6.4 which doesn't have a download file in NVIDIA archive for CUDA 10.2 Tensorflow / CUDA:未检测到 GPU - Tensorflow / CUDA: GPU not detected Tensorflow 没有使用 GPU,发现 xla_gpu 不是 gpu - Tensorflow doesn't use GPU, Finds xla_gpu not gpu cuda 驱动程序不匹配,尽管安装了 gpu,但无法使用 tensorflow gpu 模式 - cuda driver mismatch, can not use tensorflow gpu mode although gpu is installed 为什么 Python 代码未在 GPU 上实现? TensorFlow-gpu、CUDA、CUDANN 已安装 - Why is the Python code not implementing on GPU? Tensorflow-gpu, CUDA, CUDANN installed 我没有Nvidia GPU,想在CPU上运行Tensorflow模型。 为什么它总是要求一些CUDA DLL? - I don't have an Nvidia GPU and want to run a Tensorflow model on the CPU. Why does it keep asking for some CUDA DLL?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM