在使用Tensorflow训练CNN时如何解决'OutOfRangeError：序列结束'错误？

Question

I am trying to train a CNN using my own dataset. 我正在尝试使用自己的数据集训练CNN。 I've been using tfrecord files and the tf.data.TFRecordDataset API to handle my dataset. 我一直在使用tfrecord文件和tf.data.TFRecordDataset API处理我的数据集。 It works fine for my training dataset. 它对我的训练数据集工作正常。 But when I tried to batch my validation dataset, the error of 'OutOfRangeError: End of sequence' raised. 但是，当我尝试批处理验证数据集时，出现了“ OutOfRangeError：序列结束”错误。 After browsing through the Internet, I thought the problem was caused by the batch size of the validation set, which I set to 32 in the first place. 通过Internet浏览后，我认为问题是由验证集的批处理大小引起的，我首先将其设置为32。 But after I changed it to 2, the code ran for like 9 epochs and the error raised again. 但是在将其更改为2之后，代码运行了9个纪元，错误再次出现。

I used an input function to handle the dataset, the code goes below: 我使用输入函数来处理数据集，代码如下：

def input_fn(is_training, filenames, batch_size, num_epochs=1, num_parallel_reads=1):
    dataset = tf.data.TFRecordDataset(filenames,num_parallel_reads=num_parallel_reads)
    if is_training:
        dataset = dataset.shuffle(buffer_size=1500)
    dataset = dataset.map(parse_record)
    dataset = dataset.shuffle(buffer_size=10000)
    dataset = dataset.batch(batch_size)
    dataset = dataset.repeat(num_epochs)

    iterator = dataset.make_one_shot_iterator()

    features, labels = iterator.get_next()

    return features, labels

and for the training set, "batch_size" is set to 128 and "num_epochs" set to None which means keep repeating for infinite time. 对于训练集，将“ batch_size”设置为128，将“ num_epochs”设置为“无”，这意味着将无限次重复。 For the validation set, "batch_size" is set to 32(later set to 2, still didn't work) and the "num_epochs" set to 1 since I only want to go through the validation set one time. 对于验证集，“ batch_size”设置为32（后来设置为2，仍然无法使用），“ num_epochs”设置为1，因为我只想一次通过验证集。 I can assure that the validation set contains enough data for the epochs. 我可以确保验证集包含足够的数据。 Because I've tried the codes below and it didn't raise any errors: 因为我尝试了下面的代码，但没有引发任何错误：

with tf.Session() as sess:
    features, labels = input_fn(False, valid_list, 32, 1, 1)
    for i in range(450):
        sess.run([features, labels])
        print(labels.shape)

In the code above, when I changed the number 450 to 500 or anything larger, it would raise the 'OutOfRangeError'. 在上面的代码中，当我将数字450更改为500或更大时，它将引发'OutOfRangeError'。 That can confirm that my validation dataset contains enough data for 450 iterations with a batch size of 32. 这可以确认我的验证数据集包含足够的数据用于450次迭代，批处理大小为32。

I've tried to use a smaller batch size(ie, 2) for the validation set, but still having the same error. 我尝试使用较小的批处理大小（即2）作为验证集，但仍然存在相同的错误。 I can get the code running with the "num_epochs" set to "None" in the input_fn for validation set, but that does not seem to be how the validation works. 我可以在input_fn中将验证集的“ num_epochs”设置为“ None”的情况下运行代码，但这似乎不是验证工作的方式。 Any help, please? 有什么帮助吗？

Answer 1

This behaviour is normal. 这是正常现象。 From the Tensorflow documentation: 从Tensorflow文档中：

If the iterator reaches the end of the dataset, executing the Iterator.get_next() operation will raise a tf.errors.OutOfRangeError . 如果迭代器到达数据集的末尾，则执行Iterator.get_next()操作将引发tf.errors.OutOfRangeError 。 After this point the iterator will be in an unusable state, and you must initialize it again if you want to use it further. 此后，迭代器将处于无法使用的状态，如果要进一步使用它，则必须再次对其进行初始化。

The reason why the error is not raised when you set dataset.repeat(None) is because the dataset is never exhausted since it is repeated indefinitely. 设置dataset.repeat(None)时未引发错误的原因是，由于数据dataset.repeat(None)无限重复，因此永远不会耗尽。

To solve your issue, you should change your code to this: 要解决您的问题，您应该将代码更改为此：

n_steps = 450
...    

with tf.Session() as sess:
    # Training
    features, labels = input_fn(True, training_list, 32, 1, 1)

    for step in range(n_steps):
        sess.run([features, labels])
        ...
    ...
    # Validation
    features, labels = input_fn(False, valid_list, 32, 1, 1)
    try:
        sess.run([features, labels])
        ...
    except tf.errors.OutOfRangeError:
        print("End of dataset")  # ==> "End of dataset"

You can also make a few changes to your input_fn to run the evaluation at every epoch: 您还可以对input_fn进行一些更改以在每个时期运行评估：

def input_fn(is_training, filenames, batch_size, num_epochs=1, num_parallel_reads=1):
    dataset = tf.data.TFRecordDataset(filenames,num_parallel_reads=num_parallel_reads)
    if is_training:
        dataset = dataset.shuffle(buffer_size=1500)
    dataset = dataset.map(parse_record)
    dataset = dataset.shuffle(buffer_size=10000)
    dataset = dataset.batch(batch_size)
    dataset = dataset.repeat(num_epochs)

    iterator = dataset.make_initializable_iterator()
    return iterator

n_epochs = 10
freq_eval = 1

training_iterator = input_fn(True, training_list, 32, 1, 1)
training_features, training_labels = training_iterator.get_next()

val_iterator = input_fn(False, valid_list, 32, 1, 1)
val_features, val_labels = val_iterator.get_next()

with tf.Session() as sess:
    # Training
    sess.run(training_iterator.initializer)
    for epoch in range(n_epochs):
        try:
            sess.run([training_features, training_labels])
        except tf.errors.OutOfRangeError:
            pass

        # Validation
        if (epoch+1) % freq_eval == 0:
            sess.run(val_iterator.initializer)
            try:
                sess.run([val_features, val_labels])
            except tf.errors.OutOfRangeError:
                pass

I advise you to have a close look to this official guide if you want to have a better understanding of what is happening under the hood. 如果您想更好地了解幕后情况，建议您仔细阅读本官方指南。

在使用Tensorflow训练CNN时如何解决'OutOfRangeError：序列结束'错误？

问题描述

1 个解决方案

解决方案1
1 2018-12-26 16:32:28

在使用Tensorflow训练CNN时如何解决&#39;OutOfRangeError：序列结束&#39;错误？

问题描述

1 个解决方案

解决方案1 1 2018-12-26 16:32:28

在使用Tensorflow训练CNN时如何解决'OutOfRangeError：序列结束'错误？

解决方案1
1 2018-12-26 16:32:28