[英]CUDA 10.1 installed but Tensorflow doesn't run simulation on GPU
CUDA 10.1 and the NVidia drivers v440 are installed on my Ubuntu 18.04 system. CUDA 10.1 和 NVidia 驱动程序 v440 安装在我的 Ubuntu 18.04 系统上。 I don't understand why the nvidia-smi
tool reports CUDA version 10.2 when the installed version is 10.1 (see further down).我不明白为什么nvidia-smi
工具在安装的版本为 10.1 时报告 CUDA 版本为 10.2(请参阅下文)。
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro M1200 On | 00000000:01:00.0 On | N/A |
| N/A 45C P0 N/A / N/A | 962MiB / 4042MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1346 G /usr/lib/xorg/Xorg 107MiB |
| 0 1647 G /usr/bin/gnome-shell 57MiB |
| 0 2521 G /usr/lib/xorg/Xorg 414MiB |
| 0 2655 G /usr/bin/gnome-shell 206MiB |
| 0 3549 C python 26MiB |
| 0 4236 G ...quest-channel-token=1063048282371062146 139MiB |
+-----------------------------------------------------------------------------+
Whenever I try to run a Tensorflow (Python) program it seems to correctly detect the GPU on my laptop but produces a number of errors during initialization, and does not run the simulation on the GPU as can be attested by the GPU usage shown above.每当我尝试运行 Tensorflow (Python) 程序时,它似乎都能正确检测到我笔记本电脑上的 GPU,但在初始化过程中会产生许多错误,并且不会在 GPU 上运行模拟,这可以通过上面显示的 GPU 使用情况来证明。
2020-02-13 17:37:53.162545: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-13 17:37:53.167709: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-02-13 17:37:53.215323: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 17:37:53.215893: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56196a0c1980 executing computations on platform CUDA. Devices:
2020-02-13 17:37:53.215913: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Quadro M1200, Compute Capability 5.0
2020-02-13 17:37:53.235780: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz
2020-02-13 17:37:53.236381: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56196c491c70 executing computations on platform Host. Devices:
2020-02-13 17:37:53.236413: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2020-02-13 17:37:53.236721: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 17:37:53.237160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Quadro M1200 major: 5 minor: 0 memoryClockRate(GHz): 1.148
pciBusID: 0000:01:00.0
2020-02-13 17:37:53.237367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237508: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237645: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237811: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.237948: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.238083: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-02-13 17:37:53.243683: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-02-13 17:37:53.243719: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-02-13 17:37:53.243745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-13 17:37:53.243760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-02-13 17:37:53.243772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-02-13 17:37:53.273148: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:From /home/xxxxxxx/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
Some facts about the system and packages installed:关于系统和安装的软件包的一些事实:
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
# dpkg --get-selections |grep -i cuda
cuda install
cuda-10-1 install
cuda-command-line-tools-10-1 install
cuda-compiler-10-1 install
cuda-cudart-10-1 install
cuda-cudart-dev-10-1 install
cuda-cufft-10-1 install
cuda-cufft-dev-10-1 install
cuda-cuobjdump-10-1 install
cuda-cupti-10-1 install
cuda-curand-10-1 install
cuda-curand-dev-10-1 install
cuda-cusolver-10-1 install
cuda-cusolver-dev-10-1 install
cuda-cusparse-10-1 install
cuda-cusparse-dev-10-1 install
cuda-demo-suite-10-1 install
cuda-documentation-10-1 install
cuda-driver-dev-10-1 install
cuda-drivers install
cuda-gdb-10-1 install
cuda-gpu-library-advisor-10-1 install
cuda-libraries-10-1 install
cuda-libraries-dev-10-1 install
cuda-license-10-1 install
cuda-license-10-2 install
cuda-memcheck-10-1 install
cuda-misc-headers-10-1 install
cuda-npp-10-1 install
cuda-npp-dev-10-1 install
cuda-nsight-10-1 install
cuda-nsight-compute-10-1 install
cuda-nsight-systems-10-1 install
cuda-nvcc-10-1 install
cuda-nvdisasm-10-1 install
cuda-nvgraph-10-1 install
cuda-nvgraph-dev-10-1 install
cuda-nvjpeg-10-1 install
cuda-nvjpeg-dev-10-1 install
cuda-nvml-dev-10-1 install
cuda-nvprof-10-1 install
cuda-nvprune-10-1 install
cuda-nvrtc-10-1 install
cuda-nvrtc-dev-10-1 install
cuda-nvtx-10-1 install
cuda-nvvp-10-1 install
cuda-repo-ubuntu1804 install
cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01 deinstall
cuda-runtime-10-1 install
cuda-samples-10-1 install
cuda-sanitizer-api-10-1 install
cuda-toolkit-10-1 install
cuda-tools-10-1 install
cuda-visual-tools-10-1 install
# dpkg --get-selections |grep -P 'nvidia-[^\s]+\s+install$'
libnvidia-cfg1-440:amd64 install
libnvidia-common-435 install
libnvidia-common-440 install
libnvidia-compute-440:amd64 install
libnvidia-decode-440:amd64 install
libnvidia-encode-440:amd64 install
libnvidia-fbc1-440:amd64 install
libnvidia-gl-440:amd64 install
libnvidia-ifr1-440:amd64 install
nvidia-compute-utils-440 install
nvidia-dkms-440 install
nvidia-driver-440 install
nvidia-kernel-common-440 install
nvidia-kernel-source-440 install
nvidia-machine-learning-repo-ubuntu1804 install
nvidia-modprobe install
nvidia-prime install
nvidia-settings install
nvidia-utils-440 install
xserver-xorg-video-nvidia-440 install
$ pip list|grep -i tensorflow
tensorflow-estimator (1.14.0)
tensorflow-gpu (1.14.0)
Is there anything else I need to do for Python Tensorflow simulations to run on the GPU?为了在 GPU 上运行 Python Tensorflow 模拟,我还需要做些什么吗? How can I diagnose this?我该如何诊断?
From Could not dlopen library 'libcudart.so.10.0';
从Could not dlopen library 'libcudart.so.10.0';
we can get that you tensorflow package is built against CUDA 10.0.我们可以知道您的 tensorflow 包是针对 CUDA 10.0 构建的。 You should install CUDA 10.0 or build it from source (against CUDA 10.1 or 10.2) by yourself.您应该安装 CUDA 10.0 或自己从源代码(针对 CUDA 10.1 或 10.2)构建它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.