Tensorflow GPU内存错误尝试 - 除了没有捕获错误

Question

I am trying to run a hyperparameter optimization (using spearmint) on a big network with lots of trainable variables. 我正在尝试在具有大量可训练变量的大型网络上运行超参数优化（使用留兰香）。 I am worried that when I try a network with the number of hidden units too large, the Tensorflow will throw a GPU memory error. 我担心当我尝试一个隐藏单元数太大的网络时，Tensorflow会抛出GPU内存错误。

I was wondering if there is a way of catching the GPU memory error thrown by Tensorflow and skip the batch of hyperparameters that causes the memory error. 我想知道是否有一种方法可以捕获Tensorflow抛出的GPU内存错误，并跳过导致内存错误的批量超参数。

For example, I would like something like 例如，我想要像

import tensorflow as tf 

dim = [100000,100000]
X   = tf.Variable( tf.truncated_normal( dim, stddev=0.1 ) )

with tf.Session() as sess:
    try:
        tf.global_variables_initializer().run()
    except Exception as e :
        print e

When I try above to test the memory error exception, the code breaks and just prints the GPU memory error and does not progress to the except block. 当我尝试上面测试内存错误异常时，代码中断并且只打印GPU内存错误并且不会进入except块。

Answer 1

Try this : 试试这个：

import tensorflow as tf

try:
    with tf.device("gpu:0"):
        a = tf.Variable(tf.ones((10000, 10000)))
        sess = tf.Session()
        sess.run(tf.initialize_all_variables())
except:
    print("Caught error")
    import pdb; pdb.set_trace()

source : https://github.com/Hak333m/stuff/blob/master/gpu_oom.py 来源： https ： //github.com/Hak333m/stuff/blob/master/gpu_oom.py

Tensorflow GPU内存错误尝试 - 除了没有捕获错误

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-07-24 13:08:26

Tensorflow GPU内存错误尝试 - 除了没有捕获错误

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-07-24 13:08:26

解决方案1
1 已采纳 2019-07-24 13:08:26