简体   繁体   English

在 Ubuntu 20.4 LTS 上使用 GPU (CUDA-11.0) 运行代码时出现 TensorFlow 问题

[英]TensorFlow issue when running code with GPU (CUDA-11.0) on Ubuntu 20.4 LTS

Could not load dynamic library 'libcusparse.so.11';无法加载动态库“libcusparse.so.11”; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory dlerror:libcusparse.so.11:无法打开共享 object 文件:没有这样的文件或目录

Can someone help me solve the above problem?有人可以帮我解决上述问题吗?

When I try to execute the following code:当我尝试执行以下代码时:

import tensorflow as tf
if __name__ == '__main__':
    print(tf.test.is_built_with_cuda())
    print(tf.config.list_physical_devices('GPU'))

I get the following error log:我收到以下错误日志:

2021-03-07 23:47:41.236741: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
True
2021-03-07 23:47:41.953930: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-03-07 23:47:41.954322: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
[]
2021-03-07 23:47:41.981245: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-03-07 23:47:41.981758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:06:00.0 name: GeForce GTX 970 computeCapability: 5.2
coreClock: 1.329GHz coreCount: 13 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 208.91GiB/s
2021-03-07 23:47:41.981769: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-07 23:47:41.983137: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-03-07 23:47:41.983159: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-03-07 23:47:41.984153: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-03-07 23:47:41.984274: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-03-07 23:47:41.985206: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-03-07 23:47:41.985276: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-03-07 23:47:41.985339: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-03-07 23:47:41.985344: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

Process finished with exit code 0

I have manually check folder /usr/local/cuda-11.0/lib64 and I can't also find the mentioned file in the log libcusparse.so.11 .我已经手动检查文件夹/usr/local/cuda-11.0/lib64 ,我也无法在日志libcusparse.so.11中找到提到的文件。


I've followed the official TensorFlow installation steps link我已经按照官方 TensorFlow 安装步骤链接

Environment:环境:

  • OS: Ubuntu 20.04.2 LTS操作系统:Ubuntu 20.04.2 LTS
  • GPU Geforce GTX 970 GPU Geforce GTX 970
  • Driver version 450.102.04驱动程序版本 450.102.04
  • Cuda Toolkit V11.0 Cuda 工具包 V11.0
  • Cudnn V8.0.4.30 (not completely sure how to check) Cudnn V8.0.4.30 (不完全确定如何检查)
  • Python V3.7 in Anaconda venv Anaconda venv 中的 Python V3.7

Was able to fix the problem by simply re-installing Ubuntu and using "one-liner" from Lambda-Stack .能够通过简单地重新安装 Ubuntu 并使用Lambda-Stack中的“one-liner”来解决问题。

LAMBDA_REPO=$(mktemp) && \
wget -O${LAMBDA_REPO} https://lambdalabs.com/static/misc/lambda-stack-repo.deb && \
sudo dpkg -i ${LAMBDA_REPO} && rm -f ${LAMBDA_REPO} && \
sudo apt-get update && sudo apt-get install -y lambda-stack-cuda
sudo reboot

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM