简体   繁体   English

Keras数据增强参数

[英]Keras Data Augmentation Parameters

I read some materials about data augmentation in Keras but it is still a bit vague for me. 我在Keras上阅读了一些关于数据增强的资料,但对我来说仍然有些模糊。 Is there any parameter to control the the number of images created from each input image in the data augmentation step? 是否有任何参数可以控制在数据增强步骤中从每个输入图像创建的图像数量? In this example , I can't see any parameter that controls the number of images created from each image. 此示例中 ,我看不到任何控制从每个图像创建的图像数量的参数。

For example, in the below code I can have a parameter ( num_imgs ) for controlling the number of images created from each input image and stored in a folder called preview; 例如,在下面的代码中,我可以有一个参数( num_imgs ),用于控制从每个输入图像创建的图像数量,并存储在名为preview的文件夹中; but in the real-time data augmentation there isn't any parameter for this purpose. 但在实时数据增加中,没有任何参数可用于此目的。

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
num_imgs = 20
datagen = ImageDataGenerator(
        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

img = load_img('data/train/cats/cat.0.jpg')  # this is a PIL image
x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)
x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)

# the .flow() command below generates batches of randomly transformed images
# and saves the results to the `preview/` directory
i = 0
for batch in datagen.flow(x, batch_size=1,
                          save_to_dir='preview', save_prefix='cat', save_format='jpeg'):
    i += 1
    if i > num_imgs:
        break  # otherwise the generator would loop indefinitely

Data augmentation works as follows: at each learning epoch transformations with randomly selected parameters within the specified range are applied to all original images in the training set. 数据增强的工作原理如下:在每个学习时期,在指定范围内随机选择的参数的变换应用于训练集中的所有原始图像。 After an epoch is completed, ie after having exposed a learning algorithm to the entire set of training data, the next learning epoch is started and training data is once again augmented by applying specified transformations to the original training data. 在完成一个时期之后,即在将学习算法暴露于整组训练数据之后,开始下一个学习时期,并且通过将指定的变换应用于原始训练数据再次增强训练数据。

In that way the number of times each image is augmented is equal to the number of learning epochs. 以这种方式,每个图像被增强的次数等于学习时期的数量。 Recall form the example that you linked : 回想一下您链接的示例

# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(X_train, Y_train,
                    batch_size=batch_size),
                    samples_per_epoch=X_train.shape[0],
                    nb_epoch=nb_epoch,
                    validation_data=(X_test, Y_test))

Here datagen object will expose the training set to the model nb_epoch times, so each image would be augmented nb_epoch times. 这里datagen对象将训练集暴露给model nb_epoch次,因此每个图像将被增加nb_epoch次。 In this way the learning algorithm almost never sees two exactly the same training examples, because at each epoch training examples are randomly transformed. 以这种方式,学习算法几乎从不会看到两个完全相同的训练示例,因为在每个时期训练示例是随机变换的。

Here is basically how it works, it only generates one image for each input image, after all the input images has been generated once, it will start over again. 基本上它是如何工作的,它只为每个输入图像生成一个图像,在所有输入图像生成一次之后,它将重新开始。

In your example, because there's only one input image in total, it will repeatedly generate different versions of that image until there're twenty. 在您的示例中,因为总共只有一个输入图像,它将重复生成该图像的不同版本,直到二十个。

You can take a look at the source code here https://github.com/fchollet/keras/blob/master/keras/preprocessing/image.py 您可以在这里查看源代码https://github.com/fchollet/keras/blob/master/keras/preprocessing/image.py

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM