使用RTX 2080 Ti在Ubuntu 18.04中进行分段错误（核心已转储）

Question

I've recently acquired a RTX 2080 ti in order to run some deep learning projects locally. 我最近购买了RTX 2080 ti ，以便在本地运行一些深度学习项目。 I've tried to install tensorflow-gpu in Ubuntu 18.04 several times and the only guide that appears to work is the following : https://www.pugetsystems.com/labs/hpc/Install-TensorFlow-with-GPU-Support-the-Easy-Way-on-Ubuntu-18-04-without-installing-CUDA-1170/#look-at-the-job-run-with-tensorboard 我已经尝试过多次在Ubuntu 18.04中安装tensorflow-gpu，并且似乎起作用的唯一指南如下： https : //www.pugetsystems.com/labs/hpc/Install-TensorFlow-with-GPU-Support-在易路-上的Ubuntu-18-04-不-安装，CUDA-1170 /＃查找在岗-运行-与-tensorboard

However, when I begin running a script the following error shows up: 但是，当我开始运行脚本时，出现以下错误：

Using TensorFlow backend.
Train on 60000 samples, validate on 10000 samples
2019-01-09 14:49:06.748318: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-09 14:49:07.730143: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-09 14:49:07.732970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:01:00.0
totalMemory: 10.73GiB freeMemory: 10.23GiB
2019-01-09 14:49:07.733071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-09 14:49:30.666591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-09 14:49:30.666636: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-01-09 14:49:30.666646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-01-09 14:49:30.667094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9875 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
Epoch 1/15
Segmentation fault (core dumped)

enter image description here 在此处输入图片说明

Could anyone provide me some feedback in how to make tensorflow work properly with my GPU? 谁能为我提供一些有关如何使Tensorflow与我的GPU正常工作的反馈？

Thank you. 谢谢。

Answer 1

You can try this here. 您可以在这里尝试。

I'm on: RTX 2080, ubuntu 16.04 我正在使用：RTX 2080，Ubuntu 16.04

you need to install: 您需要安装：

cuda 10.0
cuDNN v7.4.1.5
libcudnn7-dev_7.4.1.5-1+cuda10.0_amd64
libcudnn7-doc_7.4.1.5-1+cuda10.0_amd64
libcudnn7_7.4.1.5-1+cuda10.0_amd64
nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64

nvidia-smi NVIDIA-SMI

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    Off  | 00000000:02:00.0 Off |                  N/A |
| 22%   39C    P0    N/A /  N/A |      0MiB /  7951MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

of some reasen nvidia-smi show 10.1, but thats wrong 某些原因的nvidia-smi显示10.1，但这是错误的

nvcc --version: nvcc --version：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

You can get it here step by step: 您可以在这里逐步获取：

1. NVIDIA-Linux driver: https://www.nvidia.com/Download/index.aspx?lang=en-us
2. cuda https://developer.nvidia.com/cuda-downloads
3. cudnn: https://developer.nvidia.com/rdp/cudnn-download
4. install: libcudnn7-dev, libcudnn7-doc, libcudnn7_7
5. install: nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb

To download libcudnn and nvidia-machine-learning: 要下载libcudnn和nvidia-machine-learning：

https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/ https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/

I'm useing: 我正在使用：

tensorflow (1.13.1) tensorflow-gpu (1.13.1) tf-nightly-gpu (1.14.1.dev20190509) 张量流（1.13.1）张量流gpu（1.13.1）tf-nightly-gpu（1.14.1.dev20190509）

Inside code eg (i got GPU work on LSTM in tensorflow !) top if your code start with: 如果代码以以下内容开头，则内部代码例如（我在tensorflow中使GPU在LSTM上工作！）

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
keras.backend.set_session(sess)

使用RTX 2080 Ti在Ubuntu 18.04中进行分段错误（核心已转储）

问题描述

1 个解决方案

解决方案1
0 2019-05-13 18:25:23

使用RTX 2080 Ti在Ubuntu 18.04中进行分段错误（核心已转储）

问题描述

1 个解决方案

解决方案1 0 2019-05-13 18:25:23

解决方案1
0 2019-05-13 18:25:23