简体   繁体   中英

How to make tensorflow use GPU on a Windows machine?

I am struggling with making tensorflow run on GPU on my MSI Windows 10 machine with NVIDIA GeForce 960M. I think I used already all hints available on internet on this topic and I am not able to succeed, so the question is, whether you can give me any additional hint on that, which could help me in achieving the goal - which is running tensorflow on a GPU?

To bo more specific:

So, I downloaded and installed CUDA Toolkit 8.0 (I downloaded the file cuda_8.0.61_win10.exe and the file with a patch cuda_8.0.61.2_windows.exe ). I executed both of them and let them run with the standard options. Then, to check whether the installation was successfull, I compiled deviceQuery from the CUDA Samples set and successfully executed it. See the results below:

<pre>
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\bin\win64\Debug>deviceQuery.exe
deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 960M"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2048 MBytes (2147483648 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1176 MHz (1.18 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 960M
Result = PASS
</pre>

...so it looks OK, at least for me... Then I downloaded and unpacked cuDNN v5.1. Additionally I have added manualy the path to the dll file of that library to the PATH system variable. I checked also, whether my graphics card was listed in the list of compatible devices and it was.

  • Then I installed tensorflow. For that I used the following command:

     *pip install tensorflow-gpu* 

It was installed without any error messages. The last message was:

Successfully installed tensorflow-1.3.0 tensorflow-gpu-1.3.0
  • I tried to run a simple python program to check, whether tensoflow is working.

The program was:

import tensorflow as tf
device_name = "/gpu:0"  # ...it works fine with "/cpu:0"; it doesn't with "/gpu:0"
with tf.device(device_name):
    ran_matrix = tf.random_uniform(shape=(1,1), minval=0, maxval=1)
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    result = sess.run(ran_matrix)
    print(result)

...and the result was (unfortunatelly) as in the screenshot below. I executed it from the level of PyCharm.

the result

The most important error message was:

  File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'random_uniform/sub': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: random_uniform/sub = Sub[T=DT_FLOAT, _device="/device:GPU:0"](random_uniform/max, random_uniform/min)]]

Additionally I tried to run the same program using CPU instead of GPU. To do that I changed the parameter in the following line: device_name = "/cpu:0"

...and it worked fine...

I searched the internet for hints, what can be wrong here, but I cannot find any specific answer (most of discussions concern problems in Ubuntu and I am using Windows 10 and I cannot change it).

Where should I start to get the problem solved?

I've just solved the problem by reinstalling tensorflow-gpu and all dependent libraries (I was trying to do it already one month ago, but by that time it did not work; right now it finally worked fine :-)). Some of the dependent libraries had new versions for sure, but I am not able to say, which one was probably the root cause of the problem.

check this: https://github.com/tensorflow/tensorflow/issues/12416

I faced same issue after updating tf from 1.2 to 1.3, and fixed it by updating cuDNN v6.0.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM