简体   繁体   English

在预测期间,数据规范化如何在keras中发挥作用?

[英]How does data normalization work in keras during prediction?

I see that the imageDataGenerator allows me to specify different styles of data normalization, eg featurewise_center, samplewise_center, etc. 我看到imageDataGenerator允许我指定不同样式的数据规范化,例如featurewise_center,samplewise_center等。

I see from the examples that if I specify one of these options, then I need to call the fit method on the generator in order to allow the generator to compute statistics like the mean image on the generator. 我从示例中看到,如果我指定其中一个选项,那么我需要在生成器上调用fit方法,以便允许生成器计算统计数据,如生成器上的平均图像。

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True)

# compute quantities required for featurewise normalization
# (std, mean, and principal components if ZCA whitening is applied)
datagen.fit(X_train)

# fits the model on batches with real-time data augmentation:
model.fit_generator(datagen.flow(X_train, Y_train, batch_size=32),
                samples_per_epoch=len(X_train), nb_epoch=nb_epoch)

My question is, how does prediction work if I have specified data normalization during training? 我的问题是,如果我在培训期间指定了数据规范化,预测是如何工作的? I can't see how in the framework I would even pass knowledge of the training set mean/std deviation along to predict to allow me to normalize my test data myself, but I also don't see in the training code where this information is stored. 我无法看到在框架中我甚至会传递训练集均值/标准偏差的知识以预测允许我自己标准化我的测试数据,但我也没有在训练代码中看到这些信息是存储。

Are the image statistics needed for normalization stored in the model so that they can be used during prediction? 归一化所需的图像统计是否存储在模型中,以便在预测期间使用它们?

Yes - this is a really huge downside of Keras.ImageDataGenerator that you couldn't provide the standarization statistics on your own. 是的 - 这是Keras.ImageDataGenerator一个非常大的缺点,你无法自己提供标准化统计数据。 But - there is an easy method on how to overcome this issue. 但是 - 如何克服这个问题有一个简单的方法。

Assuming that you have a function normalize(x) which is normalizing an image batch (remember that generator is not providing a simple image but an array of images - a batch with shape (nr_of_examples_in_batch, image_dims ..) you could make your own generator with normalization by using: 假设你有一个normalize(x)函数normalize(x)图像批处理 (请记住,生成器不提供简单的图像,而是一个图像数组 - 一个具有形状的批处理 (nr_of_examples_in_batch, image_dims ..)你可以创建自己的生成器使用标准化:

def gen_with_norm(gen, normalize):
    for x, y in gen:
        yield normalize(x), y

Then you might simply use gen_with_norm(datagen.flow, normalize) instead of datagen.flow . 然后你可以简单地使用gen_with_norm(datagen.flow, normalize)而不是datagen.flow

Moreover - you might recover the mean and std computed by a fit method by getting it from appropriate fields in datagen (eg datagen.mean and datagen.std ). 此外 - 您可以通过从datagen中的适当字段(例如datagen.meandatagen.std )获取它来恢复通过fit方法计算的meanstd datagen.std

Use the standardize method of the generator for each element. 对每个元素使用生成器的standardize方法。 Here is a complete example for CIFAR 10: 以下是CIFAR 10的完整示例:

#!/usr/bin/env python

import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D

# input image dimensions
img_rows, img_cols, img_channels = 32, 32, 3
num_classes = 10

batch_size = 32
epochs = 1

# The data, shuffled and split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()

model.add(Conv2D(32, (3, 3), padding='same', activation='relu',
                 input_shape=x_train.shape[1:]))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
              metrics=['accuracy'])

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

datagen = ImageDataGenerator(zca_whitening=True)

# Compute principal components required for ZCA
datagen.fit(x_train)

# Apply normalization (ZCA and others)
print(x_test.shape)
for i in range(len(x_test)):
    # this is what you are looking for
    x_test[i] = datagen.standardize(x_test[i])
print(x_test.shape)

# Fit the model on the batches generated by datagen.flow().
model.fit_generator(datagen.flow(x_train, y_train,
                                 batch_size=batch_size),
                    steps_per_epoch=x_train.shape[0] // batch_size,
                    epochs=epochs,
                    validation_data=(x_test, y_test))

I am using the datagen.fit function itself. 我正在使用datagen.fit函数本身。

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    featurewise_center=True,
    featurewise_std_normalization=True)
train_datagen.fit(train_data)

test_datagen = ImageDataGenerator(  
    featurewise_center=True, 
    featurewise_std_normalization=True)
test_datagen.fit(train_data)

Ideally with this, test_datagen fitted on training dataset will learn the training datasets statistics. 理想情况下,训练数据集上的test_datagen将学习训练数据集统计。 Then it will use these statistics to normalize testing data. 然后,它将使用这些统计信息来规范化测试数据。

I also had the same issue and I solved it using the same functionality, that the ImageDataGenerator used: 我也有同样的问题,我使用ImageDataGenerator使用的相同功能解决了它:

# Load Cifar-10 dataset
(trainX, trainY), (testX, testY) = cifar10.load_data()
generator = ImageDataGenerator(featurewise_center=True, 
                               featurewise_std_normalization=True)

# Calculate statistics on train dataset
generator.fit(trainX)
# Apply featurewise_center to test-data with statistics from train data
testX -= generator.mean
# Apply featurewise_std_normalization to test-data with statistics from train data
testX /= (generator.std + K.epsilon())

# Do your regular fitting
model.fit_generator(..., validation_data=(testX, testY), ...)

Note that this is only possible if you have a reasonable small dataset, like CIFAR-10. 请注意,只有拥有合理的小数据集(如CIFAR-10)才能实现此目的。 Otherwise the solution proposed by Marcin sounds good more reasonable. 否则,Marcin提出解决方案听起来更合理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM