简体   繁体   中英

Tensorflow GPU application crashes Jupyter notebook kernel

We are running Tensorflow applications on GPU using multiple Jupyter notebooks. Every once in a while one of the runs crashes the notebook, with the simple notification that "The kernel has crashed...".

When we placed the code into a python .py file, the stderr output was

F tensorflow/core/kernels/conv_ops_3d.cc:369] Check failed:   stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted

In another run the stderr reported:

F tensorflow/core/common_runtime/gpu/gpu_util.cc:296] GPU->CPU Memcpy failed

The problem is that the tensorflow applications are grabbing a lot of memory. In Linux you can run top to see what is going on. On our machine we saw that each tensorflow process was grabbing 0.55t !

When you run the process inside a Jupyter notebook and do not shutdown the notebook, the notebook does not release the memory. At some point you will run a process that cannot access memory and it will die. If you are running inside a notebook it will only tell you that the kernel has died.

Can anyone help with this?

One suggestion is to place the following snippet before you import tensorflow:

import os
os.environ["CUDA_VISIBLE_DEVICES"]="-1"

Added after @ Nicolas comment

Yes this disables GPU! Which is not what is wanted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM