简体   繁体   中英

Tensorflow GPU / CUDA installation on Ubuntu

I have set up a Ubuntu 18.04 and tried to make Tensorflow 2.2 GPU work (I have an Nvidia/CUDA graphic card) with Python. Even after reading the documentation https://www.tensorflow.org/install/gpu#linux_setup , it failed (see below for details about how it failed).

Question: would you have a canonical "todo" list (starting point: freshly installed Ubuntu server) on how to install tensorflow-gpu and make it work, with a few steps?

Notes:

  • I have read many similar forum posts, and I think that having a canonical "todo" (from a fresh Ubuntu install to having tensorflow-gpu working) would be interesting, with a few steps/bash commands

  • the documentation I used involved

     export LD_LIBRARY_PATH... # Add NVIDIA package repository sudo apt-key adv --fetch-keys http://developer.download... ... # Install CUDA and tools. Include optional NCCL 2.x sudo apt install cuda9.0 cuda...

    Even after a lot of trial and errors (I don't copy/paste all the different errors here, would be too long), then at the end:

     import tensorflow

    always failed. Some reasons included `ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory. I have already read the relevant question here , or this very long (!) Github issue .

  • After some trial and error, import tensorflow works, but it doesn't use the GPU (see also Tensorflow not running on GPU ).

Well, I was facing the same problem. The first thing to do is to look up, which Tensorflow version is required. In your case Tensorflow 2.2 . requires CUDA 10.1 . The correct cuDNN version is also important. In your case it would be cuDNN 7.4 . An additional point is the installed python version. I would recommend Python 3.5-3.8 . If one those mismatch, a fully compatibility is almost impossible.

So if you want a check list, here you go:

  1. Install CUDA 10.1 by installing nvidia-cuda-toolkit.
  2. Install the cuDNN version compatible with CUDA 10.1.
  3. Export CUDA environment variables.
  4. If Bazel is not installed, you will be asked on that.
  5. Install TensorFlow 2.2 using pip. I would highly recommend the usage of a virtual environment.

You can find the compatibility check list of Tensorflow and CUDA here

You can find the CUDA Toolkit here

Finally get cuDNN in the correct version here

That's all.

I faced the problem as well when using the Google Cloud Platform for two projects involving deep learning. They provide servers with nothing but a freshly installed Ubuntu OS. Regarding my experience, I recommend doing the following steps:

  • Look up the cuda and cuDNN version supported by the current Tensorflow release on the Tensorflow page .
  • Install the targeted cuda version from the deb package retrieved from Nvidias cuda page and be careful that more recent cuda versions might not work! This will automatically install the corresponding Nvidia drivers.
  • Install the targeted cuDNN version from this page and again be careful that a more recent cuDNN version might not work .
  • Install tensorflow-gpu using pip.

This should work. Your problem is probably that you are using a more recent cuda version than targeted by the current Tensorflow release.

To install tensorflow-gpu, the guidelines which are provided on official website are very tedious for beginers, instead we can do these simple steps:

Note : NVIDIA driver must be installed before this(you can verify this using command nvidia-smi).

  1. Install Anaconda https://www.anaconda.com/distribution/ ?
  2. Create an virtual environment using command "conda create -n envname"
  3. Then activate env using command "conda activate envname"
  4. Finally install tensorflow using command "conda install tensorflow-gpu"

With the given code

import tensorflow as tf
      if tf.test.gpu_device_name():
           print('Default GPU Device{}'.format(tf.test.gpu_device_name()))
      else:
           print("not using gpu")

You can find the tutorial on link given below https://www.pugetsystems.com/labs/hpc/Install-TensorFlow-with-GPU-Support-the-Easy-Way-on-Ubuntu-18-04-without-installing-CUDA-1170/ ?

I would suggest to first check the availability of GPU using nvidia-smi command.

I had faced the same issue, i was able to resolve it by using docker container, you can install docker using Install Docker Engine on Ubuntu or use the Digital Ocean guide (i used this one) How To Install and Use Docker on Ubuntu 18.04

After that it is simple just run the following command based on the requirements

NV_GPU='0' nvidia-docker run --runtime=nvidia -it -v /path/to/folder:/path/to/folder/for/docker/container nvcr.io/nvidia/tensorflow:17.11

NV_GPU='0' nvidia-docker run --runtime=nvidia -it -v /storage/research/:/storage/research/ nvcr.io/nvidia/tensorflow:20.12-tf2-py3

Here '0' represents the GPU number, if you want to use more than one GPU just use '0,1,2' and so on ....

Hope this solves the issue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM