已安装 CUDA 10.1 但 Tensorflow 不在 GPU 上运行模拟

Question

CUDA 10.1 and the NVidia drivers v440 are installed on my Ubuntu 18.04 system. CUDA 10.1 和 NVidia 驱动程序 v440 安装在我的 Ubuntu 18.04 系统上。 I don't understand why the nvidia-smi tool reports CUDA version 10.2 when the installed version is 10.1 (see further down).我不明白为什么nvidia-smi工具在安装的版本为 10.1 时报告 CUDA 版本为 10.2（请参阅下文）。

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M1200        On   | 00000000:01:00.0  On |                  N/A |
| N/A   45C    P0    N/A /  N/A |    962MiB /  4042MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1346      G   /usr/lib/xorg/Xorg                           107MiB |
|    0      1647      G   /usr/bin/gnome-shell                          57MiB |
|    0      2521      G   /usr/lib/xorg/Xorg                           414MiB |
|    0      2655      G   /usr/bin/gnome-shell                         206MiB |
|    0      3549      C   python                                        26MiB |
|    0      4236      G   ...quest-channel-token=1063048282371062146   139MiB |
+-----------------------------------------------------------------------------+

Whenever I try to run a Tensorflow (Python) program it seems to correctly detect the GPU on my laptop but produces a number of errors during initialization, and does not run the simulation on the GPU as can be attested by the GPU usage shown above.每当我尝试运行 Tensorflow (Python) 程序时，它似乎都能正确检测到我笔记本电脑上的 GPU，但在初始化过程中会产生许多错误，并且不会在 GPU 上运行模拟，这可以通过上面显示的 GPU 使用情况来证明。

2020-02-13 17:37:53.162545: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-13 17:37:53.167709: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-02-13 17:37:53.215323: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 17:37:53.215893: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56196a0c1980 executing computations on platform CUDA. Devices:
2020-02-13 17:37:53.215913: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Quadro M1200, Compute Capability 5.0
2020-02-13 17:37:53.235780: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz
2020-02-13 17:37:53.236381: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56196c491c70 executing computations on platform Host. Devices:
2020-02-13 17:37:53.236413: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2020-02-13 17:37:53.236721: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 17:37:53.237160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Quadro M1200 major: 5 minor: 0 memoryClockRate(GHz): 1.148
pciBusID: 0000:01:00.0
2020-02-13 17:37:53.237367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237508: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237645: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237811: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237948: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.238083: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.243683: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-02-13 17:37:53.243719: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-02-13 17:37:53.243745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-13 17:37:53.243760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2020-02-13 17:37:53.243772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2020-02-13 17:37:53.273148: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:From /home/xxxxxxx/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Some facts about the system and packages installed:关于系统和安装的软件包的一些事实：

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:    18.04
Codename:   bionic

# dpkg --get-selections |grep -i cuda
cuda                        install
cuda-10-1                   install
cuda-command-line-tools-10-1            install
cuda-compiler-10-1              install
cuda-cudart-10-1                install
cuda-cudart-dev-10-1                install
cuda-cufft-10-1                 install
cuda-cufft-dev-10-1             install
cuda-cuobjdump-10-1             install
cuda-cupti-10-1                 install
cuda-curand-10-1                install
cuda-curand-dev-10-1                install
cuda-cusolver-10-1              install
cuda-cusolver-dev-10-1              install
cuda-cusparse-10-1              install
cuda-cusparse-dev-10-1              install
cuda-demo-suite-10-1                install
cuda-documentation-10-1             install
cuda-driver-dev-10-1                install
cuda-drivers                    install
cuda-gdb-10-1                   install
cuda-gpu-library-advisor-10-1           install
cuda-libraries-10-1             install
cuda-libraries-dev-10-1             install
cuda-license-10-1               install
cuda-license-10-2               install
cuda-memcheck-10-1              install
cuda-misc-headers-10-1              install
cuda-npp-10-1                   install
cuda-npp-dev-10-1               install
cuda-nsight-10-1                install
cuda-nsight-compute-10-1            install
cuda-nsight-systems-10-1            install
cuda-nvcc-10-1                  install
cuda-nvdisasm-10-1              install
cuda-nvgraph-10-1               install
cuda-nvgraph-dev-10-1               install
cuda-nvjpeg-10-1                install
cuda-nvjpeg-dev-10-1                install
cuda-nvml-dev-10-1              install
cuda-nvprof-10-1                install
cuda-nvprune-10-1               install
cuda-nvrtc-10-1                 install
cuda-nvrtc-dev-10-1             install
cuda-nvtx-10-1                  install
cuda-nvvp-10-1                  install
cuda-repo-ubuntu1804                install
cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01   deinstall
cuda-runtime-10-1               install
cuda-samples-10-1               install
cuda-sanitizer-api-10-1             install
cuda-toolkit-10-1               install
cuda-tools-10-1                 install
cuda-visual-tools-10-1              install


# dpkg --get-selections |grep -P 'nvidia-[^\s]+\s+install$'
libnvidia-cfg1-440:amd64            install
libnvidia-common-435                install
libnvidia-common-440                install
libnvidia-compute-440:amd64         install
libnvidia-decode-440:amd64          install
libnvidia-encode-440:amd64          install
libnvidia-fbc1-440:amd64            install
libnvidia-gl-440:amd64              install
libnvidia-ifr1-440:amd64            install
nvidia-compute-utils-440            install
nvidia-dkms-440                 install
nvidia-driver-440               install
nvidia-kernel-common-440            install
nvidia-kernel-source-440            install
nvidia-machine-learning-repo-ubuntu1804     install
nvidia-modprobe                 install
nvidia-prime                    install
nvidia-settings                 install
nvidia-utils-440                install
xserver-xorg-video-nvidia-440           install
$ pip list|grep -i tensorflow
tensorflow-estimator (1.14.0)
tensorflow-gpu (1.14.0)

Is there anything else I need to do for Python Tensorflow simulations to run on the GPU?为了在 GPU 上运行 Python Tensorflow 模拟，我还需要做些什么吗？ How can I diagnose this?我该如何诊断？

Answer 1

From Could not dlopen library 'libcudart.so.10.0';从Could not dlopen library 'libcudart.so.10.0'; we can get that you tensorflow package is built against CUDA 10.0.我们可以知道您的 tensorflow 包是针对 CUDA 10.0 构建的。 You should install CUDA 10.0 or build it from source (against CUDA 10.1 or 10.2) by yourself.您应该安装 CUDA 10.0 或自己从源代码（针对 CUDA 10.1 或 10.2）构建它。

已安装 CUDA 10.1 但 Tensorflow 不在 GPU 上运行模拟

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-13 20:46:52

已安装 CUDA 10.1 但 Tensorflow 不在 GPU 上运行模拟

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-13 20:46:52

解决方案1
1 已采纳 2020-02-13 20:46:52