简体   繁体   English

如何修复 NVIDIA Tegra TX2 上的 tensorflow 错误“GPU 同步失败”

[英]How to fix tensorflow error 'GPU sync failed' on NVIDIA Tegra TX2

I am trying to run a prediction of a model build in keras on my NVIDIA Tegra TX2 using Tensorflow and Python (2.7) and I am quite randomly running in tensorflow giving me the following exception: I am trying to run a prediction of a model build in keras on my NVIDIA Tegra TX2 using Tensorflow and Python (2.7) and I am quite randomly running in tensorflow giving me the following exception:

Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4504 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) 2019-10-04 16:17:50.786531: E tensorflow/stream_executor/cuda/cuda_driver.cc:1032] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN: unknown error:: *** Begin stack trace *** stream_executor::gpu::GpuDriver::SynchronizeContext(stream_executor::gpu::GpuContext*) stream_executor::StreamExecutor::SynchronizeAllActivity() tensorflow::GPUUtil::SyncAll(tensorflow::Device*) *** End stack trace ***

... ...

tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

Sometimes after a few reboots / waiting time the problem is solved and I can run the prediction again but 8 out of 10 times this error appears.有时在几次重新启动/等待时间后问题就解决了,我可以再次运行预测,但 10 次中有 8 次出现此错误。

I've already tried the following:我已经尝试过以下方法:

  • Change the query amount and memory usage as follows:更改查询量和 memory 用法如下:
 config = tf.ConfigProto() config.gpu_options.allow_growth = True config.per_process_gpu_memory_fraction = 0.7 session = tf.Session(config=config, ...)

I would be really happy for any further suggestions.如果有任何进一步的建议,我将非常高兴。

Since this problem is intermittent, this may be happening when tensorflow is not getting the required memory.由于此问题是间歇性的,当 tensorflow 未获得所需的 memory 时,可能会发生此问题。 It is a known issue and you have already tried the basic trouble shooting steps.这是一个已知问题,您已经尝试了基本的故障排除步骤。 Since you are still getting the issue, try below steps as well:由于您仍然遇到问题,请尝试以下步骤:

reinstall libhdf5-dev, python-h5py重新安装 libhdf5-dev、python-h5py

sudo apt-get install libhdf5-dev
sudo apt-get install python-h5py

and then set gpu allow growth as per " https://github.com/keras-team/keras/issues/4161#issuecomment-366031228 "然后根据“ https://github.com/keras-team/keras/issues/4161#issuecomment-366031228 ”设置 gpu 允许增长

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
set_session(sess)

Adding the following line solved the issue.添加以下行解决了该问题。 Note my TensorFlow version is 2.1>注意我的 TensorFlow 版本是 2.1>

import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM