[英]Running tensorflow on GPU cluster in virtualenv
I installed the GPU version of tensorflow in a virtualenv following these instructions . 我按照这些说明在virtualenv中安装了GPU版本的tensorflow。 The problem is, I am getting a segmentation fault upon starting a session. 问题是,我在启动会话时遇到分段错误。 That is, this code: 也就是说,这段代码:
import tensorflow as tf
sess = tf.InteractiveSession()
exits with the following error: 退出时出现以下错误:
(tesnsorflowenv)user@machine$ python testtensorflow.py
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:93] Couldn't open CUDA library libcudnn.so.6.5. LD_LIBRARY_PATH: :/vol/cuda/7.0.28/lib64
I tensorflow/stream_executor/cuda/cuda_dnn.cc:1382] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 40
Segmentation fault
I tried to dig deeper using gdb but only got the following additional outputs: 我尝试使用gdb深入挖掘,但只获得了以下额外输出:
[New Thread 0x7fffdf880700 (LWP 32641)]
[New Thread 0x7fffdf07f700 (LWP 32642)]
... lines omitted
[New Thread 0x7fffadffb700 (LWP 32681)]
[Thread 0x7fffadffb700 (LWP 32681) exited]
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
Any ideas what is happening here and how to fix it? 任何想法在这里发生了什么以及如何解决它?
Here is the output of nvidia-smi: 这是nvidia-smi的输出:
+------------------------------------------------------+
| NVIDIA-SMI 352.63 Driver Version: 352.63 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 0000:06:00.0 Off | 0 |
| N/A 65C P0 142W / 149W | 235MiB / 11519MiB | 81% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 On | 0000:07:00.0 Off | 0 |
| N/A 25C P8 30W / 149W | 55MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 On | 0000:0D:00.0 Off | 0 |
| N/A 27C P8 26W / 149W | 55MiB / 11519MiB | 0% Prohibited |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 On | 0000:0E:00.0 Off | 0 |
| N/A 25C P8 28W / 149W | 55MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 On | 0000:86:00.0 Off | 0 |
| N/A 46C P0 85W / 149W | 206MiB / 11519MiB | 97% E. Process |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 On | 0000:87:00.0 Off | 0 |
| N/A 27C P8 29W / 149W | 55MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 On | 0000:8D:00.0 Off | 0 |
| N/A 28C P8 26W / 149W | 55MiB / 11519MiB | 0% Prohibited |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 On | 0000:8E:00.0 Off | 0 |
| N/A 23C P8 30W / 149W | 55MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
Thanks for any help on this issue! 感谢您对此问题的任何帮助!
It's not finding CuDNN - 它找不到CuDNN -
I tensorflow/stream_executor/dso_loader.cc:93] Couldn't open CUDA library > libcudnn.so.6.5. 我的tensorflow / stream_executor / dso_loader.cc:93]无法打开CUDA库> libcudnn.so.6.5。 LD_LIBRARY_PATH: :/vol/cuda/7.0.28/lib64 I tensorflow/stream_executor/cuda/cuda_dnn.cc:1382] Unable to load cuDNN DSO LD_LIBRARY_PATH :: / vol / cuda / 7.0.28 / lib64 I tensorflow / stream_executor / cuda / cuda_dnn.cc:1382]无法加载cuDNN DSO
You need to have it installed. 你需要安装它。 Please see the TensorFlow CUDA installation instructions 请参阅TensorFlow CUDA安装说明
After untar the cudnn 解开cudnn之后
[root@localhost cudnn]# cd include/
[root@localhost include]# mv cudnn.h /usr/local/cuda/include/
[root@localhost include]# cd ../lib64/
[root@localhost lib64]# mv * /usr/local/cuda/lib
And it is ok 没关系
[root@localhost ~]# python
Python 2.7.5 (default, Sep 15 2016, 22:37:39)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as f
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0 locally
>>>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.