简体   繁体   English

TensorFlow Python 脚本被杀死

[英]TensorFlow Python script getting killed

I am a beginner in TensorFlow.我是 TensorFlow 的初学者。 My TensorFlow script abruptly exits saying Killed .我的 TensorFlow 脚本突然退出说Killed My code is as follows:我的代码如下:

import tensorflow as tf
# Load data X_train, y_train and X_valid, y_valid

# An image augmentation pipeline
def augment(x):
    x = tf.image.random_brightness(x, max_delta=0.2)
    x = tf.image.random_contrast(x, 0.5, 2)
    return x

from sklearn.utils import shuffle
X_train, y_train = shuffle(X_train, y_train)

def LeNet(x):
    # Define LeNet architecture
    return logits

# Features:
x = tf.placeholder(tf.float32, (None, 32, 32, 3))
# Labels:
y = tf.placeholder(tf.int32, (None))
# Dropout probability
prob = tf.placeholder(tf.float32, (None))
# Learning rate
rate = tf.placeholder(tf.float32, (None))
rate_summary = tf.summary.scalar('learning rate', rate)

logits = LeNet(x)
accuracy_operation = # defined accuracy_operation

accuracy_summary = tf.summary.scalar('validation accuracy', accuracy_operation)
saver = tf.train.Saver()

summary = tf.summary.merge_all()
writer = tf.summary.FileWriter('./summary', tf.get_default_graph())

def evaluate(X_data, y_data):
    # Return accuracy with X_data, y_data
    return accuracy

with tf.Session() as sess:

    saver.restore(sess, './lenet')

    for i in range(EPOCHS):
        X_train, y_train = shuffle(X_train, y_train)
        for offset in range(0, len(X_train), BATCH_SIZE):
            end = offset + BATCH_SIZE
            batch_x, batch_y = X_train[offset:end], y_train[offset:end]
            batch_x = sess.run(augment(batch_x))

            # Run the training operation, update learning rate

        validation_accuracy = evaluate(X_valid, y_valid)
        writer.add_summary(sess.run(summary, feed_dict = {x: X_valid, y: y_valid, prob: 1., rate: alpha}))

I have omitted the parts which I know for sure are not causing problems.我省略了我确定不会引起问题的部分。 I know which parts are fine because the script was not giving any troubles earlier.我知道哪些部分没问题,因为脚本之前没有出现任何问题。 After adding certain parts (mainly the summary writer operations ), the script abruptly says Killed and exits after executing a certain number of training operations.添加某些部分(主要是摘要编写器操作)后,脚本会在执行一定数量的训练操作后突然说Killed并退出。 I suspect this is due to a memory leak but I can't detect it.我怀疑这是由于内存泄漏,但我无法检测到它。

I ran into a similar problem just a few days ago.几天前我遇到了类似的问题。 In my case I had some operations that turned out to be computationally very heavy, as I learnt later.就我而言,我有一些运算结果证明计算量非常大,正如我后来了解到的。 As soon as I reduced the size of my tensors, the message disappeared and my code ran.一旦我减小了张量的大小,消息就消失了,我的代码就运行了。 I can't tell exactly what is the cause of the issue in your case, but from my experience and from what you say (that error only comes up when adding summary) I would suggest to fiddle with the size of your X_valid, Y_valid.我无法确切地说出您的案例中出现问题的原因是什么,但是根据我的经验和您所说的内容(该错误仅在添加摘要时出现),我建议您调整 X_valid、Y_valid 的大小。 It might just be that the writer can't cope with too much data...可能只是作者应付不了太多的数据……

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM