I am struggling with making tensorflow run on GPU on my MSI Windows 10 machine with NVIDIA GeForce 960M. I think I used already all hints available on internet on this topic and I am not able to succeed, so the question is, whether you can give me any additional hint on that, which could help me in achieving the goal - which is running tensorflow on a GPU?
To bo more specific:
So, I downloaded and installed CUDA Toolkit 8.0 (I downloaded the file cuda_8.0.61_win10.exe and the file with a patch cuda_8.0.61.2_windows.exe ). I executed both of them and let them run with the standard options. Then, to check whether the installation was successfull, I compiled deviceQuery from the CUDA Samples set and successfully executed it. See the results below:
<pre>
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\bin\win64\Debug>deviceQuery.exe
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 960M"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2048 MBytes (2147483648 bytes)
( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1176 MHz (1.18 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 960M
Result = PASS
</pre>
...so it looks OK, at least for me... Then I downloaded and unpacked cuDNN v5.1. Additionally I have added manualy the path to the dll file of that library to the PATH system variable. I checked also, whether my graphics card was listed in the list of compatible devices and it was.
Then I installed tensorflow. For that I used the following command:
*pip install tensorflow-gpu*
It was installed without any error messages. The last message was:
Successfully installed tensorflow-1.3.0 tensorflow-gpu-1.3.0
The program was:
import tensorflow as tf
device_name = "/gpu:0" # ...it works fine with "/cpu:0"; it doesn't with "/gpu:0"
with tf.device(device_name):
ran_matrix = tf.random_uniform(shape=(1,1), minval=0, maxval=1)
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
result = sess.run(ran_matrix)
print(result)
...and the result was (unfortunatelly) as in the screenshot below. I executed it from the level of PyCharm.
The most important error message was:
File "C:\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'random_uniform/sub': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
[[Node: random_uniform/sub = Sub[T=DT_FLOAT, _device="/device:GPU:0"](random_uniform/max, random_uniform/min)]]
Additionally I tried to run the same program using CPU instead of GPU. To do that I changed the parameter in the following line: device_name = "/cpu:0"
...and it worked fine...
I searched the internet for hints, what can be wrong here, but I cannot find any specific answer (most of discussions concern problems in Ubuntu and I am using Windows 10 and I cannot change it).
Where should I start to get the problem solved?
I've just solved the problem by reinstalling tensorflow-gpu and all dependent libraries (I was trying to do it already one month ago, but by that time it did not work; right now it finally worked fine :-)). Some of the dependent libraries had new versions for sure, but I am not able to say, which one was probably the root cause of the problem.
check this: https://github.com/tensorflow/tensorflow/issues/12416
I faced same issue after updating tf from 1.2 to 1.3, and fixed it by updating cuDNN v6.0.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.