简体   繁体   English

使用RTX 2080 Ti在Ubuntu 18.04中进行分段错误(核心已转储)

[英]Segmentation fault (core dumped) in Ubuntu 18.04 using a RTX 2080 ti

I've recently acquired a RTX 2080 ti in order to run some deep learning projects locally. 我最近购买了RTX 2080 ti ,以便在本地运行一些深度学习项目。 I've tried to install tensorflow-gpu in Ubuntu 18.04 several times and the only guide that appears to work is the following : https://www.pugetsystems.com/labs/hpc/Install-TensorFlow-with-GPU-Support-the-Easy-Way-on-Ubuntu-18-04-without-installing-CUDA-1170/#look-at-the-job-run-with-tensorboard 我已经尝试过多次在Ubuntu 18.04中安装tensorflow-gpu,并且似乎起作用的唯一指南如下: https : //www.pugetsystems.com/labs/hpc/Install-TensorFlow-with-GPU-Support-在易路-上的Ubuntu-18-04-不-安装,CUDA-1170 /#查找在岗-运行-与-tensorboard

However, when I begin running a script the following error shows up: 但是,当我开始运行脚本时,出现以下错误:

Using TensorFlow backend.
Train on 60000 samples, validate on 10000 samples
2019-01-09 14:49:06.748318: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-09 14:49:07.730143: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-09 14:49:07.732970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
totalMemory: 10.73GiB freeMemory: 10.23GiB
2019-01-09 14:49:07.733071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-09 14:49:30.666591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-09 14:49:30.666636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-01-09 14:49:30.666646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-01-09 14:49:30.667094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9875 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
Epoch 1/15
Segmentation fault (core dumped)

enter image description here 在此处输入图片说明

Could anyone provide me some feedback in how to make tensorflow work properly with my GPU? 谁能为我提供一些有关如何使Tensorflow与我的GPU正常工作的反馈?

Thank you. 谢谢。

You can try this here. 您可以在这里尝试。

I'm on: RTX 2080, ubuntu 16.04 我正在使用:RTX 2080,Ubuntu 16.04

you need to install: 您需要安装:

cuda 10.0
cuDNN v7.4.1.5
libcudnn7-dev_7.4.1.5-1+cuda10.0_amd64
libcudnn7-doc_7.4.1.5-1+cuda10.0_amd64
libcudnn7_7.4.1.5-1+cuda10.0_amd64
nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64

nvidia-smi NVIDIA-SMI

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    Off  | 00000000:02:00.0 Off |                  N/A |
| 22%   39C    P0    N/A /  N/A |      0MiB /  7951MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

of some reasen nvidia-smi show 10.1, but thats wrong 某些原因的nvidia-smi显示10.1,但这是错误的

nvcc --version: nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

You can get it here step by step: 您可以在这里逐步获取:

1. NVIDIA-Linux driver: https://www.nvidia.com/Download/index.aspx?lang=en-us
2. cuda https://developer.nvidia.com/cuda-downloads
3. cudnn: https://developer.nvidia.com/rdp/cudnn-download
4. install: libcudnn7-dev, libcudnn7-doc, libcudnn7_7
5. install: nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb

To download libcudnn and nvidia-machine-learning: 要下载libcudnn和nvidia-machine-learning:

https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/ https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/

I'm useing: 我正在使用:

tensorflow (1.13.1) tensorflow-gpu (1.13.1) tf-nightly-gpu (1.14.1.dev20190509) 张量流(1.13.1)张量流gpu(1.13.1)tf-nightly-gpu(1.14.1.dev20190509)

Inside code eg (i got GPU work on LSTM in tensorflow !) top if your code start with: 如果代码以以下内容开头,则内部代码例如(我在tensorflow中使GPU在LSTM上工作!)

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
keras.backend.set_session(sess)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM