简体   繁体   English

在 WSL2 上安装 Tensorflow-GPU

[英]Install Tensorflow-GPU on WSL2

Has anyone successfully installed Tensorflow-GPU on WSL2 with NVIDIA GPUs?有没有人在使用 NVIDIA GPU 的 WSL2 上成功安装 Tensorflow-GPU? I have Ubuntu 18.04 on WSL2, but am struggling to get NVIDIA drivers installed.我在 WSL2 上有 Ubuntu 18.04,但正在努力安装 NVIDIA 驱动程序。 Any help would be appreciated as I'm lost.任何帮助将不胜感激,因为我迷路了。

So I have just got this running.所以我刚刚开始运行。

The steps you need to follow are here .您需要遵循的步骤在这里 To summarise them:总结一下:

  1. sign up for windows insider program and get the development builds of windows so that you have the latest version注册windows insider program,获取windows的开发版本,让你拥有最新版本
  2. Install wsl 2安装 wsl 2
  3. Install Ubuntu from the windows store从 windows 商店安装 Ubuntu
  4. Install the wsl 2 cuda driver on windows在windows上安装wsl 2 cuda驱动
  5. Install cuda toolkit安装 cuda 工具包
  6. Install cudnn (you can download the linux version from windows and then copy the file to linux)安装cudnn(可以从windows下载linux版本,然后复制文件到linux)
  7. If you are getting memory errors like 'cannot allocate memory' then you might need to increase the amount of memory wsl can get如果您收到 memory 错误,例如“无法分配内存”,那么您可能需要增加 memory wsl 可以获得的数量
  8. Then install tensorflow-gpu然后安装tensorflow-gpu
  9. pray it works祈祷它有效

bugs I hit along the way:我一路上遇到的错误:

  • If when you open ubuntu for the first time you get an error you need to enable virutalisation in the bios如果您第一次打开 ubuntu 时出现错误,您需要在 bios 中启用虚拟化
  • If you cannot run the./Blackscholes example in the installation instructions you might not have the right build of windows!如果您无法运行安装说明中的 ./Blackscholes 示例,您可能没有正确构建的 Windows! You must have the right version你必须有正确的版本
  • if you are getting 'cannot allocate memory' errors when running tf you need to give wsl more ram.如果在运行 tf 时出现“无法分配内存”错误,则需要为 wsl 提供更多内存。 It only access half your ram by default默认情况下它只能访问一半的内存
    1. create a.wslconfig file under your user directory in windows with the amount of memory you want.在你windows的用户目录下创建一个.wslconfig文件,里面有你想要的memory的数量。 Mine looks like:我的看起来像:
[wsl2]
memory=16GB 

Edit after running some code运行一些代码后编辑

This is much slower then when I was running on windows directly.这比我直接在 windows 上运行时要慢得多。 I went from 1 minute per epoch to 5 minutes.我从每个纪元 1 分钟变成了 5 分钟。 I'm just going to dualboot.我只是要双启动。

These are the steps I had to follow for Ubuntu 20.04.这些是我为 Ubuntu 20.04 必须遵循的步骤。 I am no longer on dev channel, beta channel works fine for this use case and is much more stable.我不再使用开发频道,测试版频道适用于此用例并且更加稳定。

Install WSL2安装 WSL2

Install Ubuntu 20.04 from Windows Store从 Windows Store 安装 Ubuntu 20.04

Install Nvidia Drivers for Windows from: https://developer.nvidia.com/cuda/wsl/download为 Windows 安装 Nvidia 驱动程序: https://developer.nvidia.com/cuda/wsl/download

Install nvcc inside of WSL with: sudo apt install nvidia-cuda-toolkit使用以下命令在 WSL 内部安装 nvcc: sudo apt install nvidia-cuda-toolkit

Check that it is there with: nvcc --version检查它是否在那里: nvcc --version

For my use case, I do data science and already had anaconda installed.对于我的用例,我从事数据科学工作,并且已经安装了 anaconda。 I created an environment with:我创建了一个环境:

conda create --name tensorflow
conda install tensorflow-gpu

Then just test it with this little python program with the environment activated:然后只需在激活环境的情况下使用这个 python 小程序对其进行测试:

import tensorflow as tf
tf.config.list_physical_devices('GPU')
sys_details = tf.sysconfig.get_build_info()
cuda = sys_details["cuda_version"]
cudnn = sys_details["cudnn_version"]
print(cuda, cudnn)

For reasons I do not understand, my machine was unable to find the GPU without installing the nvcc and actually gave an error message saying it could not find nvcc.由于我不明白的原因,我的机器在没有安装 nvcc 的情况下无法找到 GPU 并且实际上给出了一条错误消息说它找不到 nvcc。

Online tutorials I had found which had you downloading CUDA and CUDNN separately but I thinkNVCC includes CUDNN since it is.我发现在线教程让您分别下载 CUDA 和 CUDNN,但我认为 NVCC 包含 CUDNN,因为它是。 . . . . there somehow.不知何故。

I can confirm I am able to get this working without the need for Docker on WSL2 thanks to the following article:由于以下文章,我可以确认我能够在 WSL2 上无需 Docker 即可完成此工作:

https://qiita.com/Navier/items/cf551908bae707db4258 https://qiita.com/Navier/items/cf551908bae707db4258

Be sure to update to driver version 460.15 , not 455.41 as listed in the CUDA documentation.请务必更新到驱动程序版本460.15 ,而不是455.41文档中列出的 455.41。

Note, this does not work with the card in TCC mode (only WDDM).请注意,这不适用于 TCC 模式下的卡(仅限 WDDM)。 Also, be sure to place your files on the Linux file system (ie not on a mount drive, like /mnt/c/ ).另外,请务必将您的文件放在 Linux 文件系统上(即不要放在挂载驱动器上,例如/mnt/c/ )。 Performance is significantly faster on the Linux file system (this has to do with the difference in implementation of WSL 1 vs. WSL 2; see 1 , 2 , and 3 ). Linux 文件系统的性能明显更快(这与 WSL 1 与 WSL 2 的实施差异有关;参见123 )。

NOTE: See also Is the class generator (inheriting Sequence) thread safe in Keras/Tensorflow?注意:另见Is the class generator (inheriting Sequence) thread safe in Keras/Tensorflow?

I just want to point out that using anaconda to install cudatoolkit and cudnn does not seem to work in wsl.我只想指出,使用 anaconda 安装 cudatoolkit 和 cudnn 在 wsl 中似乎不起作用。

Maybe there is some problem with paths that make TF look for the needed files only in the system paths instead of the conda enviroments.也许路径存在一些问题,使 TF 仅在系统路径而不是 conda 环境中查找所需的文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM