如何在 keras 自定义回调中访问 tf.data.Dataset？

Question

我编写了一个自定义 keras 回调来检查来自生成器的增强数据。 （有关完整代码，请参阅此答案。）但是，当我尝试对tf.data.Dataset使用相同的回调时，它给了我一个错误：

  File "/path/to/tensorflow_image_callback.py", line 16, in on_batch_end
imgs = self.train[batch][images_or_labels]
TypeError: 'PrefetchDataset' object is not subscriptable

keras 回调一般只适用于生成器，还是与我编写它的方式有关？ 有没有办法修改我的回调或数据集以使其工作？

我认为这个谜题分为三部分。 我愿意对任何和所有这些进行更改。 首先是自定义回调类中的init函数：

class TensorBoardImage(tf.keras.callbacks.Callback):
    def __init__(self, logdir, train, validation=None):
        super(TensorBoardImage, self).__init__()
        self.logdir = logdir
        self.file_writer = tf.summary.create_file_writer(logdir)
        self.train = train
        self.validation = validation

其次，同一个类中的on_batch_end函数

def on_batch_end(self, batch, logs):
    images_or_labels = 0 #0=images, 1=labels
    imgs = self.train[batch][images_or_labels]

三、实例化回调

import tensorflow_image_callback
tensorboard_image_callback = tensorflow_image_callback.TensorBoardImage(logdir=tensorboard_log_dir, train=train_dataset, validation=valid_dataset)
model.fit(train_dataset,
          epochs=n_epochs,
          validation_data=valid_dataset, 
          callbacks=[
                    tensorboard_callback,
                    tensorboard_image_callback
                    ])

一些尚未使我得到答案的相关主题：

在自定义回调中访问验证数据

创建 keras 回调以在训练期间保存每个批次的模型预测和目标

Answer 1

最终对我tfds是以下内容，使用tfds ：

__init__函数：

def __init__(self, logdir, train, validation=None):
    super(TensorBoardImage, self).__init__()
    self.logdir = logdir
    self.file_writer = tf.summary.create_file_writer(logdir)
    # #from keras generator
    # self.train = train
    # self.validation = validation
    #from tf.Data
    my_data = tfds.as_numpy(train)
    imgs = my_data['image']

然后on_batch_end ：

def on_batch_end(self, batch, logs):
    images_or_labels = 0 #0=images, 1=labels
    imgs = self.train[batch][images_or_labels]

    #calculate epoch
    n_batches_per_epoch = self.train.samples / self.train.batch_size
    epoch = math.floor(self.train.total_batches_seen / n_batches_per_epoch)

    #since the training data is shuffled each epoch, we need to use the index_array to find something which uniquely 
    #identifies the image and is constant throughout training
    first_index_in_batch = batch * self.train.batch_size
    last_index_in_batch = first_index_in_batch + self.train.batch_size
    last_index_in_batch = min(last_index_in_batch, len(self.train.index_array))
    img_indices = self.train.index_array[first_index_in_batch : last_index_in_batch]

    with self.file_writer.as_default():
        for ix,img in enumerate(imgs):
            #only post 1 out of every 1000 images to tensorboard
            if (img_indices[ix] % 1000) == 0:
                #instead of img_filename, I could just use str(img_indices[ix]) as a unique identifier
                #but this way makes it easier to find the unaugmented image
                img_filename = self.train.filenames[img_indices[ix]]

                #convert float to uint8, shift range to 0-255
                img -= tf.reduce_min(img)
                img *= 255 / tf.reduce_max(img)
                img = tf.cast(img, tf.uint8)
                img_tensor = tf.expand_dims(img, 0) #tf.summary needs a 4D tensor
                
                tf.summary.image(img_filename, img_tensor, step=epoch)

我不需要对实例化进行任何更改。

我建议只将它用于调试，否则它会将数据集中的每个第 n 个图像保存到每个 epoch 的 tensorboard 中。 这最终可能会使用大量磁盘空间。

如何在 keras 自定义回调中访问 tf.data.Dataset？

问题描述

1 个解决方案

解决方案1
1 2021-06-15 01:09:45

如何在 keras 自定义回调中访问 tf.data.Dataset？

问题描述

1 个解决方案

解决方案1 1 2021-06-15 01:09:45

解决方案1
1 2021-06-15 01:09:45