简体   繁体   English

在Tensorflow中使用tf.contrib.data.Datasets API时无法处理错误

[英]Unable to handle Errors at end of epoch when using tf.contrib.data.Datasets API in tensorflow

I have created tfrecords as my database. 我已经创建了tfrecords作为我的数据库。 The database consists of 9 different tfrecord files. 该数据库由9个不同的tfrecord文件组成。 The purpose was to feed in a batch of samples from the 9 databases into the model. 目的是将来自9个数据库的一批样本输入到模型中。 Therefore, I have used the zip function with TFRecordDataset . 因此,我已将zip函数与TFRecordDataset一起使用 Each sample consists of a frame along with its feature set. 每个样本都包含一个框架及其功能集。 Hence, I need to take 8 samples from each tfrecord file, which gives a total of 72 (features, image) in a batch. 因此,我需要从每个tfrecord文件中抽取8个样本,这总共提供了72个(features, image) Hence, I extracted the features alone with images as shown in the code below. 因此,我将图像中的特征单独提取出来,如下面的代码所示。

Problem: When I reach the end of the 1st epoch, the remaining data as less than 72. As a result, the data from the second epoch as added to make up a batch of 72 samples. 问题:当我到达第一个时期的末尾时,剩余数据少于72个。结果,第二个时期的数据相加就构成了72个样本的批次。 So, this is not desigrable in my case since I am training a Recurrent neural network, so I have a state that should be consistent (Not necessary to be discussed now). 因此,由于我正在训练递归神经网络,因此这在我的情况下是不可思议的,因此我的状态应保持一致(现在无需讨论)。

Therefore, I didn't use the repeat function, instead, tries to implement what is mentioned in https://www.tensorflow.org/programmers_guide/datasets Under Processing multiple epochs, ie, use for loop with try and except. 因此,我没有使用重复功能,而是尝试实现https://www.tensorflow.org/programmers_guide/datasets中提到的内容。在“处理多个时期”下,即使用try和except进行循环。

# Compute for 100 epochs.
for _ in range(100):
  sess.run(iterator.initializer)
  while True:
    try:
      sess.run(next_element)
    except tf.errors.OutOfRangeError:
      break

  # [Perform end-of-epoch calculations here.]

Once I did that, I got into another problem. 一旦这样做,我就会遇到另一个问题。 First here is my full code: 首先,这是我的完整代码:

import tensorflow as tf
import numpy as np
import time
import cv2

num_epoch = 1
batch_size = 8 # This is set to 8 since
num_threads = 9
common = "C:/Users/user/PycharmProjects/AffectiveComputingNew/database/"
filenames = [(common + "train_1_db.tfrecords"), (common + "train_2_db.tfrecords"), (common + "train_3_db.tfrecords"),
     (common + "train_4_db.tfrecords"), (common + "train_5_db.tfrecords"), (common + "train_6_db.tfrecords"),
     (common + "train_7_db.tfrecords"), (common + "train_8_db.tfrecords"), (common + "train_9_db.tfrecords")]

# Transforms a scalar string `example_proto` into a pair of a scalar string and
# a scalar integer, representing an image and its label, respectively.
def _parse_function(example_proto):
    features = {
        'height': tf.FixedLenFeature([], tf.int64),
        'width': tf.FixedLenFeature([], tf.int64),
        'image_raw': tf.FixedLenFeature([], tf.string),
        'features': tf.FixedLenFeature([432], tf.float32)
    }

    parsed_features = tf.parse_single_example(example_proto, features)

    # This is how we create one example, that is, extract one example from the database.
    image = tf.decode_raw(parsed_features['image_raw'], tf.uint8)
    # The height and the weights are used to
    height = tf.cast(parsed_features['height'], tf.int32)
    width = tf.cast(parsed_features['width'], tf.int32)

    # The image is reshaped since when stored as a binary format, it is flattened. Therefore, we need the
    # height and the weight to restore the original image back.
    image = tf.reshape(image, [height, width, 3])

    features = parsed_features['features']

    return features, image

random_features = tf.Variable(tf.zeros([72, 432], tf.float32))
random_images = tf.Variable(tf.zeros([72, 112, 112, 3]))

datasets = []
for _ in filenames:
    datasets.append(tf.contrib.data.TFRecordDataset(_).map(_parse_function))

dataset_ziped = tf.contrib.data.TFRecordDataset.zip((datasets[0], datasets[1], datasets[2], datasets[3],
      datasets[4], datasets[5], datasets[6], datasets[7], datasets[8]))
#dataset = dataset_ziped.repeat(num_epoch)
dataset = dataset_ziped.batch(batch_size)

iterator = dataset.make_initializable_iterator()
next_batch = iterator.get_next() # This has shape: [9, 2]

features = tf.concat((next_batch[0][0], next_batch[1][0], next_batch[2][0], next_batch[3][0],
                      next_batch[4][0], next_batch[5][0], next_batch[6][0], next_batch[7][0],
                      next_batch[8][0]), axis=0)
features = tf.reshape(features, shape=[9, 8, 432]) # where 8 * 9 = 72
features = tf.transpose(features, perm=[1, 0, 2]) # shape becomes: [8, 9, 432]
features = tf.reshape(features, shape=[72, 432]) # Now frames will be: 1st frame from 1st video, second from second video...

images = tf.concat((next_batch[0][1], next_batch[1][1], next_batch[2][1], next_batch[3][1],
                    next_batch[4][1], next_batch[5][1], next_batch[6][1], next_batch[7][1],
                    next_batch[8][1]), axis=0)
images = tf.reshape(images, shape=[9, 8, 112, 112, 3])
images = tf.transpose(images, perm=[1, 0, 2, 3, 4])
images = tf.reshape(images, shape=[72, 112, 112, 3])

init_op = tf.global_variables_initializer()

with tf.Session() as sess:
    # Initialize `iterator` with training data.
    sess.run(init_op)

    for _ in range(num_epoch):
        sess.run(iterator.initializer)

        # This while loop will run indefinitly until the end of the first epoch
        while True:
            try:
                lst = []
                features_np = sess.run([features])[0] # since the output is always: (1, 72, 432)

                for f in features_np:
                    lst.append(f[0])

            except tf.errors.OutOfRangeError:
                print('errorrrrr')

So, since the number of samples is no longer 72, I run into an error at line: 因此,由于样本数量不再是72个,因此在以下行遇到错误:

features = tf.reshape(features, shape=[9, 8, 432]) # where 8 * 9 = 72

So, I need a way to handle this error. 因此,我需要一种方法来处理此错误。 I tried assertion as follows: 我尝试断言如下:

assert_op = tf.Assert(tf.equal(tf.shape(features[0]), batch_size * 9), [features])
with tf.control_dependencies([assert_op])... after features = tf.concat...

And it didn't work. 而且它没有用。 I tried tf.cond as follows (and it didn't work as well): 我按如下方式尝试了tf.cond(但效果不佳):

tf.cond(tf.equal(tf.shape(features)[0], batch_size * 9),
        lambda: tf.assign(random_features, features),
        lambda: tf.assign(random_features, random_features))

features = tf.reshape(random_features, shape=[9, 8, 432]) # where 8 * 9 = 72
....

In conclusion, I need to way to iterate over epochs without interleaving samples from different iterations, and at the same time holding the issue of the feature while using the reshape function (where the batch size is less than 72 in my case). 总而言之,我需要一种方法来遍历各个时期,而不会交织来自不同迭代的样本,并且同时使用reshape函数(在本例中,批量大小小于72)同时解决了该功能的问题。

Any help is much appreciated!! 任何帮助深表感谢!!

So, since the last batch is truncated, I created a temporary variable that will be assigned the value of the initial batch whenever the batch size is equal to my batch_size specified in the code. 因此,由于最后一个批次被截断,因此我创建了一个临时variable ,每当批次大小等于代码中指定的batch_size时,该variable就会被分配初始批次的值。 Therefore, once the size of the current batch is not longer equal to batch_size , I used the temporary batch that I have created. 因此,一旦当前批处理的大小不再等于batch_size ,我就使用我创建的临时批处理。 Here is the following solution: 以下是解决方案:

import tensorflow as tf
import numpy as np
import time
import cv2

num_epoch = 2
batch_size = 8 # This is set to 8 since
num_threads = 9
common = "C:/Users/user/PycharmProjects/AffectiveComputingNew/database/"
filenames = [(common + "train_1_db.tfrecords"), (common + "train_2_db.tfrecords"), (common + "train_3_db.tfrecords"),
     (common + "train_4_db.tfrecords"), (common + "train_5_db.tfrecords"), (common + "train_6_db.tfrecords"),
     (common + "train_7_db.tfrecords"), (common + "train_8_db.tfrecords"), (common + "train_9_db.tfrecords")]

# Transforms a scalar string `example_proto` into a pair of a scalar string and
# a scalar integer, representing an image and its label, respectively.
def _parse_function(example_proto):
    features = {
        'height': tf.FixedLenFeature([], tf.int64),
        'width': tf.FixedLenFeature([], tf.int64),
        'image_raw': tf.FixedLenFeature([], tf.string),
        'features': tf.FixedLenFeature([432], tf.float32)
    }

    parsed_features = tf.parse_single_example(example_proto, features)

    # This is how we create one example, that is, extract one example from the database.
    image = tf.decode_raw(parsed_features['image_raw'], tf.uint8)
    # The height and the weights are used to
    height = tf.cast(parsed_features['height'], tf.int32)
    width = tf.cast(parsed_features['width'], tf.int32)

    # The image is reshaped since when stored as a binary format, it is flattened. Therefore, we need the
    # height and the weight to restore the original image back.
    image = tf.reshape(image, [height, width, 3])

    features = parsed_features['features']

    return features, image
# Here is the temp var that I will use whenever the return batch from the dataset doesn't have a size of batch_size * 9 mentioned above. 
random_features = tf.Variable(tf.zeros([72, 432], tf.float32))
random_images = tf.Variable(tf.zeros([72, 112, 112, 3], tf.uint8))

datasets = []
for _ in filenames:
    datasets.append(tf.contrib.data.TFRecordDataset(_).map(_parse_function))

dataset_ziped = tf.contrib.data.TFRecordDataset.zip((datasets[0], datasets[1], datasets[2], datasets[3],
      datasets[4], datasets[5], datasets[6], datasets[7], datasets[8]))
dataset = dataset_ziped.batch(batch_size)

iterator = dataset.make_initializable_iterator()
next_batch = iterator.get_next() # This has shape: [9, 2]

features = tf.concat((next_batch[0][0], next_batch[1][0], next_batch[2][0], next_batch[3][0],
                      next_batch[4][0], next_batch[5][0], next_batch[6][0], next_batch[7][0],
                      next_batch[8][0]), axis=0)
images = tf.concat((next_batch[0][1], next_batch[1][1], next_batch[2][1], next_batch[3][1],
                    next_batch[4][1], next_batch[5][1], next_batch[6][1], next_batch[7][1],
                    next_batch[8][1]), axis=0)

def get_features(features, images):
    with tf.control_dependencies([tf.assign(random_features, features), tf.assign(random_images, images)]):
        features = tf.reshape(features, shape=[9, 8, 432]) # where 8 * 9 = 72
        features = tf.transpose(features, perm=[1, 0, 2]) # shape becomes: [8, 9, 432]
        features = tf.reshape(features, shape=[72, 432]) # Now frames will be: 1st frame from 1st video, second from second video...

        images = tf.reshape(images, shape=[9, 8, 112, 112, 3])
        images = tf.transpose(images, perm=[1, 0, 2, 3, 4])
        images = tf.reshape(images, shape=[72, 112, 112, 3])
        return features, images

condition1 = tf.equal(tf.shape(features)[0], batch_size * 9)
condition2 = tf.equal(tf.shape(images)[0], batch_size * 9)

condition = tf.logical_and(condition1, condition2)

features, images = tf.cond(condition,
                           lambda: get_features(features, images),
                           lambda: get_features(random_features, random_images))

init_op = tf.global_variables_initializer()

with tf.Session() as sess:
    # Initialize `iterator` with training data.
    sess.run(init_op)

    for _ in range(num_epoch):
        sess.run(iterator.initializer)

        # This while loop will run indefinitly until the end of the first epoch
        while True:
            try:
                lst = []
                features_np, images_np = sess.run([features, images])

                for f in features_np:
                    lst.append(f[0])

                print(lst)
            except tf.errors.OutOfRangeError:
                print('errorrrrr')
                break

Please note that I was always mentioning batch_size * 9 since when zipping the datasets, that would create a sample data who's elements are fetched from 9 different datasets. 请注意,我总是提到batch_size * 9因为在压缩数据集时,这将创建一个样本数据,其元素是从9个不同的数据集中获取的。 Therefore, since I assigned the batch_size to be 8, to have 72 samples, I took 8 * 9. 因此,由于我将batch_size分配为8,有72个样本,所以我取8 * 9。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用tf.data.Datasets冻结Tensorflow图时确定输入节点 - Determining input nodes when freezing Tensorflow graphs using tf.data.Datasets 是否可以使用张量流回调将纪元结果记录在tf.keras模型中,以便在训练结束时保存? - Is it possible to log the epoch results in the tf.keras model using a tensorflow callback, in order to save at the end of training? 使用高级API tf.contrib.learn.DNNClassifier时Tensorflow批处理大小是多少 - what is the Tensorflow batch size when you use high-level API tf.contrib.learn.DNNClassifier Tensorflow-使用tf.contrib.layers.conv2d时,我可以设置权重和偏差的名称吗? - Tensorflow - When using tf.contrib.layers.conv2d, can I set the name of the weights and biases? TensorFlow 2.0 中 tf.contrib.factorization.KMeans 的等效 API 是多少? - What is the equivalent API of tf.contrib.factorization.KMeans in TensorFlow 2.0? Tensorflow:数据 api 用于大数据集 - Tensorflow: data api for big datasets TensorFlow tf.contrib 的替代品 - Alternative for TensorFlow tf.contrib 使用 tensorflow_datasets.load (TF 2.1) 拆分训练数据以进行训练和验证 - Split train data to train and validation by using tensorflow_datasets.load (TF 2.1) 使用 tensorflow API tf.data.Dataset.list_files 时出现 Unicode 错误 - Unicode error when using tensorflow API tf.data.Dataset.list_files Tensorflow:FailedPreconditionError:表未初始化(使用tf.data.Dataset API) - Tensorflow: FailedPreconditionError: Table not initialized (using tf.data.Dataset API)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM