简体   繁体   中英

Tensorflow does not apply data augmentation properly

I'm trying to apply the process of data augmentation to a database. I use the following code:

train_generator = keras.utils.image_dataset_from_directory(
        directory= train_dir,
        subset = "training",
        image_size = (50,50),
        batch_size = 32,
        validation_split = 0.3,
        seed = 1337,
        labels = "inferred",
        label_mode = 'binary'
    )

    

    validation_generator = keras.utils.image_dataset_from_directory(
        subset="validation",
        directory=validation_dir,
        image_size=(50,50),
        batch_size =40,
        seed=1337,
        validation_split = 0.3,
        labels = "inferred",
        label_mode ='binary'
    )

    
    
    data_augmentation = keras.Sequential([
        keras.layers.RandomFlip("horizontal"),
        keras.layers.RandomRotation(0.1),
        keras.layers.RandomZoom(0.1),
    ])
    

    train_dataset = train_generator.map(lambda x, y: (data_augmentation(x, training=True), y))

But when I try to run the training processe using this method, I get a "insuficient data" warning:

6/100 [>.............................] - ETA: 21s - loss: 0.7602 - accuracy: 0.5200WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 2000 batches). You may need to use the repeat() function when building your dataset.
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 10 batches). You may need to use the repeat() function when building your dataset.

Yes, the original dataset is insuficient, but the data augmentation should provide more than enough data for the training. Does anyone know what's going on?

EDIT:

fit call:

history = model.fit( 
train_dataset, 
epochs = 20, 
steps_per_epoch = 100, 
validation_data = validation_generator, 
validation_steps = 10, 
callbacks=callbacks_list)

This is the version I have using DataImageGenerator:

train_datagen = keras.preprocessing.image.ImageDataGenerator(rescale =1/255,rotation_range = 40,width_shift_range = 0.2,height_shift_range = 0.2,shear_range = 0.2,zoom_range = 0.2,horizontal_flip = True)

train_generator = train_datagen.flow_from_directory(directory= train_dir,target_size = (50,50),batch_size = 32,class_mode = 'binary')

val_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1/255)
validation_generator = val_datagen.flow_from_directory(directory=validation_dir,target_size=(50,50),batch_size =40,class_mode ='binary')

This specific code (with this same number of epochs, steps_per_epoch and batchsize) was taken from the book deeplearning with python, by François Chollet, it's an example on page 141 of a data augmentation system. As you may have guessed, this produces the same results as the other method displayed.

You probably have less images in your directory than what you are asking your model to fit

When we state that data augmentation increases the number of instances, we usually understand that an altered version of a sample would be created for the model to process. It's just image preprocessing with randomness.

If you closely inspect your training log, you will get your solution, shown below. The main issue with your approach is simply discussed in this post .

WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 2000 batches). You may need to use the repeat() function when building your dataset.

So, to solve this, we can use .repeat() function. To understand what it does, you can check this answer. Here is the sample code that should work for you.

train_ds= keras.utils.image_dataset_from_directory(
    ...
 )
train_ds = train_ds.map(
      lambda x, y: (data_augmentation(x, training=True), y)
)
val_ds = keras.utils.image_dataset_from_directory(
   ...
)

# using .repeat function
train_ds = train_ds.repeat().shuffle(8 * batch_size)
train_ds = train_ds.prefetch(buffer_size=tf.data.AUTOTUNE)

val_ds = val_ds.repeat()
val_ds = val_ds.prefetch(buffer_size=tf.data.AUTOTUNE)
# specify step per epoch 
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=..,
  steps_per_epoch = train_ds.cardinality().numpy(),
  validation_steps = val_ds.cardinality().numpy(),
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM