简体   繁体   English

system76 ubuntu 20.04 tensorflow gpu Z39466FE22B062A384CFE09F3CC8Z2C版本冲突

[英]system76 ubuntu 20.04 tensorflow gpu cuda version conflicts

After an upgrade to Ubuntu 20.04 from 18.04 Tensorflow is no longer able to use my gpu because it is attempting to mix and load different versions (some 10 and some 11).从 18.04 升级到 Ubuntu 20.04 后 Tensorflow 不再能够使用我的 gpu (因为它正在尝试混合和加载不同的版本)。 It is a System76 machine, and I have cuda 10.1 installed from System76 (so it works with the System76 nvidia driver).这是一台 System76 机器,我从 System76 安装了 cuda 10.1(因此它与 System76 nvidia 驱动程序一起使用)。 When running tensorflow the following errors occur:运行 tensorflow 时出现以下错误:

2021-01-07 18:12:22.584886: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-01-07 18:12:22.584906: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-01-07 18:12:23.640665: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-07 18:12:23.641412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-07 18:12:23.669966: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-07 18:12:23.670257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1060 computeCapability: 6.1
coreClock: 1.733GHz coreCount: 10 deviceMemorySize: 5.93GiB deviceMemoryBandwidth: 178.99GiB/s
2021-01-07 18:12:23.670328: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.670379: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.670425: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.671387: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-07 18:12:23.671667: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-07 18:12:23.673022: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-01-07 18:12:23.673100: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-01-07 18:12:23.673245: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-07 18:12:23.673259: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.

Notice all the warnings are for attempting to load version 11 of Cuda but it's only for some of the libraries.请注意,所有警告都是针对尝试加载版本 11 的 Cuda 的,但它仅适用于某些库。 The version 10 ones load fine.版本 10 加载正常。

This is the output of nvcc --version这是 nvcc --version 的 output

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

This is the output of nvidia-smi这是nvidia-smi的output

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P0    26W /  N/A |    585MiB /  6069MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2999      G   /usr/lib/xorg/Xorg                101MiB |
|    0   N/A  N/A      3479      G   /usr/lib/xorg/Xorg                255MiB |
|    0   N/A  N/A      3720      G   /usr/bin/gnome-shell               88MiB |
|    0   N/A  N/A      6487      G   ...AAAAAAAA== --shared-files       45MiB |
|    0   N/A  N/A      6959      G   ...AAAAAAAA== --shared-files       40MiB |
|    0   N/A  N/A     11642      G   ...AAAAAAAA== --shared-files       21MiB |
|    0   N/A  N/A     25206      G   WickrMe                            17MiB |
+-----------------------------------------------------------------------------+

I see that the driver version in the output of nvidia-smi is version 11, but as I understand it, that has nothing to do with cuda runtime.我看到 nvidia-smi 的 output 中的驱动程序版本是 11 版,但据我了解,这与 cuda 运行时无关。 That is simply the version up to which the driver supports.这只是驱动程序支持的版本。 Correct me if I'm wrong.如我错了请纠正我。

I have to use version 10 because that is what is supported by System76 and it worked fine prior to the upgrade.我必须使用版本 10,因为这是 System76 支持的版本,并且在升级之前它运行良好。 I have also tried uninstalling and re-installing Tensorflow via pip3 and no luck.我还尝试通过 pip3 卸载并重新安装 Tensorflow 并且没有运气。

Does anyone know how get all the libraries in sync to version 10.1?有谁知道如何让所有库同步到 10.1 版? I also tried to manually place the version 11 libraries in place and let Tensorflow use the mixed version (which of course is a bad idea) but it won't recognize them (or I didn't place them properly).我还尝试手动放置版本 11 库并让 Tensorflow 使用混合版本(这当然是个坏主意)但它不会识别它们(或者我没有正确放置它们)。

As @talonmies pointed out, I was misunderstanding the versioning system.正如@talonmies 指出的那样,我误解了版本控制系统。 However, because it's a System76 machine, it was also confounding because System76 uses their own Nvidia driver, and it's not straightforward to install Cuda 11 and Cudnn.不过,因为是System76的机器,所以也让人困惑,因为System76使用的是自己的Nvidia驱动,安装Cuda 11和Cudnn也不是很简单。 I'm posting the answer in case anyone else runs into problems with System76.我发布答案以防其他人遇到 System76 问题。

First, DO NOT use the System76 install for Cuda and Cudnn.首先,不要为 Cuda 和 Cudnn 使用 System76 安装。 They have their own versions (on their website) so as to be compatible with their Nvidia driver, but they will not work (they are version 10, and TF 2.2+ requires 11).他们有自己的版本(在他们的网站上),以便与他们的 Nvidia 驱动程序兼容,但他们不会工作(他们是版本 10,而 TF 2.2+ 需要 11)。 Also, most general Cuda guides will tell you to uninstall/install the Nvida driver first so as to have a clean install, but DO NOT do this if you have a System76 system.此外,大多数通用 Cuda 指南会告诉您首先卸载/安装 Nvida 驱动程序以便进行全新安装,但如果您有 System76 系统,请不要这样做。 Just leave the System76 driver alone.只需不理会 System76 驱动程序。 Also, if you have any previous Cuda/Cudnn remove/uninstall all of it.此外,如果您有任何以前的 Cuda/Cudnn,请将其全部删除/卸载。

Go to Nvidia and get their latest Cuda and Cudnn. Go 到 Nvidia 并获得他们最新的 Cuda 和 Cudnn。 I used我用了

wget http://developer.download.nvidia.com/compute/cuda/11.0.2/local_installers/cuda_11.0.2_450.51.05_linux.run

Run that with运行它

sudo sh cuda_11.0.2_450.51.05_linux.run

When it runs it will tell you that you have a conflict with the driver package.当它运行时,它会告诉您您与驱动程序 package 有冲突。 Ignore that and proceed.忽略它并继续。 When you get to the install menu, UNCHECK "install driver" and continue with the install.当您进入安装菜单时,取消选中“安装驱动程序”并继续安装。 When it's done, add to your path完成后,添加到您的路径

/usr/local/cuda-11.0:/usr/local/cuda-11.0/bin:

You need to add both the cuda root and bin, not just bin (which is different than most general instructions).您需要同时添加 cuda 根和 bin,而不仅仅是 bin(这与大多数通用指令不同)。 Source your.bashrc or.profile or wherever you put the path addition (or open a new terminal).来源 your.bashrc 或 .profile 或放置路径添加的任何位置(或打开一个新终端)。

Now install Cudnn.现在安装 Cudnn。

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/libcudnn8_8.0.5.39-1+cuda11.0_amd64.deb

Install it with dpkg.使用 dpkg 安装它。 For example (in my case)...例如(在我的情况下)......

sudo dpkg -i libcudnn8_8.0.5.39-1+cuda11.0_amd64.deb

That's it.而已。 Once I completed all that, everything worked fine.一旦我完成了所有这些,一切都很好。 Hope that helps some System76 people get through Ununtu 20.04 and Cuda 11 a little easier.希望能帮助一些 System76 人更轻松地通过 Ununtu 20.04 和 Cuda 11。

Thank you very much.非常感谢。 One of the reasons I have used POP OS is that the Nvidia drivers+cuda/cudnn just worked with tensorflow, until this issue with version 11.0 missing.我使用 POP OS 的原因之一是 Nvidia 驱动程序+cuda/cudnn 仅与 tensorflow 一起使用,直到此问题缺少 11.0 版。

One thing I needed to be able in install cuda 11.0 using the recipe above was to install gcc versions 8:我需要能够使用上面的配方安装 cuda 11.0 的一件事是安装 gcc 版本 8:

sudo apt -y install gcc-8 g++-8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 8
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 8

I really wish POP._os would provide CUDA 11.0 packages directly.....我真的希望 POP._os 会直接提供 CUDA 11.0 包.....

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Tensorflow-gpu 不适用于 Nvidia 驱动程序 455.45 和 CUDA 版本 - UBUNTU 20.04 上的 11.1 - Tensorflow-gpu doesn't working with Nvidia driver 455.45 & CUDA version - 11.1 on UBUNTU 20.04 CUDA 和 CUDAtoolkit TensorFlow 在 wsl2 ubuntu 20.04 中启用了 gpu - CUDA and CUDAtoolkit TensorFlow in wsl2 ubuntu 20.04 with gpu enabled Tensorflow GPU / CUDA 在 Ubuntu 上的安装 - Tensorflow GPU / CUDA installation on Ubuntu 在 Ubuntu 20.04 上设置 Tensorflow 2.4 和 GPU 不带 sudo - Setup Tensorflow 2.4 on Ubuntu 20.04 with GPU without sudo 无法获得卷积算法错误 ~ tensorflow-gpu on ubuntu 20.04 - Failed to get convolution algorithm error ~ tensorflow-gpu on ubuntu 20.04 tensorflow-gpu conda 环境不适用于 ubuntu-20.04 - tensorflow-gpu conda environment not working on ubuntu-20.04 Tensorflow安装错误:Ubuntu 16.04上的GPU版本 - Tensorflow installation error: GPU version on Ubuntu 16.04 在ubuntu 14.04下使用sporder中的tensorflow / cuda GPU安装 - Use GPU installation of tensorflow/cuda in spyder under ubuntu 14.04 在Ubuntu 16.04上安装Tensorflow GPU时,CUDA NN路径问题 - Cuda nn path issues in installing Tensorflow gpu on ubuntu 16.04 如何使用最新版本的CUDA和cuDNN安装Tensorflow GPU - How can I install Tensorflow GPU with latest version of CUDA and cuDNN
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM