简体   繁体   English

如何增加 Tensorflow 数据集中的数据?

[英]How to augment data in a Tensorflow Dataset?

For a set of images, I was confused if the term Data Augmentation meant to transform the current dataset (eg crop/flip/rotate/...) or if it meant to increase the amount of data by adding the cropped/flipped/rotated images to the initial dataset.对于一组图像,如果术语数据增强意味着转换当前数据集(例如裁剪/翻转/旋转/...),或者是否意味着通过添加裁剪/翻转/旋转来增加数据量,我感到困惑图像到初始数据集。 As far as I understand, from this question and this one , it means both.据我了解,从 this question和 this one ,它意味着两者。 Please correct me if I'm wrong.如果我错了,请纠正我。

So, using Tensorflow Dataset, I want to achieve the second one: augmenting the amount of data.所以,使用 Tensorflow Dataset,我想实现第二个:增加数据量。

I'm using the ImageNet data from TFDS (trainning set is not available):我正在使用来自 TFDS 的 ImageNet 数据(训练集不可用):

import tensorflow_datasets as tfds
ds = tfds.load('imagenet_a', split='test', as_supervised=True)

And I want to flip the images:我想翻转图像:

def transform(image, label):
    image = tf.image.flip_left_right(image)
    return image, label

It works well if I apply the transformation directly to the dataset.如果我将转换直接应用于数据集,效果会很好。 But it doesn't increase the amount of data:但它不会增加数据量:

ds = ds.map(transform)

So, I tried to create a second dataset and concatenate both:因此,我尝试创建第二个数据集并将两者连接起来:

ds0 = ds.map(transform)
ds = ds.concatenate(ds0)

But I get the following error:但我收到以下错误:

TypeError: Two datasets to concatenate have different types (tf.uint8, tf.int64) and (tf.float32, tf.int64)

Is it the way to do to concatenate two datasets to increase a training set?连接两个数据集以增加训练集是一种方法吗? Or how to do it correctly?或者如何正确地做到这一点? (or how to fix my error) (或如何解决我的错误)

I'm aware of ImageDataGenerator , but it doesn't contain the transformation I want我知道ImageDataGenerator ,但它不包含我想要的转换

As the error clearly says, the two datasets should be having the same data types, you can achieve this using tf.cast but this is a bit hectic process for a large dataset.正如错误明确指出的那样,两个数据集应该具有相同的数据类型,您可以使用tf.cast实现这tf.cast但对于大型数据集tf.cast ,这是一个有点忙碌的过程。

You can also merge datasets using tf.data.experimental.sample_from_datasets您还可以使用tf.data.experimental.sample_from_datasets合并数据集

Below is the code with the illustration.下面是带有插图的代码。

import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import img_to_array, array_to_img
ds , info = tfds.load('imagenet_a', split='test', as_supervised=True,with_info=True)

Original sample images:原始示例图像:

vis = tfds.visualization.show_examples(ds, info)

在此处输入图片说明

I'm taking 10 images for testing and flip those 10 images randomly with map() function to create a new dataset.我正在拍摄 10 张图像进行测试,并使用map()函数随机翻转这 10 张图像以创建一个新数据集。

ds1 = ds.take(10)
ds2 = ds1.map(lambda image, label: (tf.image.random_flip_left_right(image), label))
#Merging both the datasets

new_ds = tf.data.experimental.sample_from_datasets([ds1,ds2])
print(len(list(new_ds))) # Which returns 20, 10 original plus 10 randomly filpped images. 

f, axarr = plt.subplots(5,4,figsize=(15, 15))

ix = 0
i = 0
count = 0
k = 0

for images, labels in new_ds:
  crop_img = array_to_img(images)
  axarr[i,ix].imshow(crop_img)
  ix=ix+1
  count = count + 1
  if count == 4:
     i = i + 1
     count = 0
     ix = 0

Merged Dataset:合并数据集:

You can see the merged data with original images and randomly flipped images.您可以看到原始图像和随机翻转图像的合并数据。 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM