简体   繁体   中英

Performing 10 fold cross validation in training with Image Data Generator

I have created a CNN to do binary classification in a dataset of 400 images. My code is the following:

def neural_network():
  classifier = Sequential()

  # Adding a first convolutional layer
  classifier.add(Convolution2D(48, 3, input_shape = (320, 320, 3), activation = 'relu'))
  classifier.add(MaxPooling2D())

  # Adding a second convolutional layer
  classifier.add(Convolution2D(48, 3, activation = 'relu'))
  classifier.add(MaxPooling2D())

  #Flattening
  classifier.add(Flatten())

  #Full connected
  classifier.add(Dense(256, activation = 'relu'))
 
  #Full connected
  classifier.add(Dense(1, activation = 'sigmoid'))


  classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

  classifier.summary()


  train_datagen = ImageDataGenerator(rescale = 1./255,
                                    shear_range = 0.2,
                                    horizontal_flip = True,
                                    vertical_flip=True,
                                    brightness_range=[0.5, 1.5])

  test_datagen = ImageDataGenerator(rescale = 1./255)
  test_final_datagen = ImageDataGenerator(rescale = 1./255)
  test_final_four = ImageDataGenerator(rescale = 1./255)

  training_set = train_datagen.flow_from_directory('/content/drive/My Drive/data_sep/train',
                                                  target_size = (320, 320),
                                                  batch_size = 32,
                                                  class_mode = 'binary')

  test_set = test_datagen.flow_from_directory('/content/drive/My Drive/data_sep/validate',
                                              target_size = (320, 320),
                                              batch_size = 32,
                                              class_mode = 'binary')

  
  test_final = test_final_datagen.flow_from_directory('/content/drive/My Drive/data_sep/validate',
                                              target_size = (320, 320),
                                              batch_size = 32,
                                              class_mode = 'binary',
                                              shuffle = False)

  filepath  = "/content/drive/My Drive/data_sep/weightsbestval.hdf5"
  checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max',save_weights_only=True)
  callbacks_list = [checkpoint]

  history = classifier.fit(training_set,
                          epochs  = 50,
                          validation_data = test_set,
                          callbacks= [callbacks_list]
                          )
  
  
  best_score = max(history.history['val_accuracy'])

How can I perform 10 fold cross validation on my dataset? I have not seen anywhere 10 fold being performed with data augmentation, but with so few images, without it the accuracy will be very low. What can I do?

conceptually what you need is the following:

  • dump all images into single directory
  • put all filenames into a dataframe
  • generate indices for k-fold with sklearn.model_selection.KFold
  • run 10 cycles of:
    • select train and validation filenames using DF slices with k-fold indices.
    • use ImageDataGenerator.dataflow_from_dataframe() to feed the model
    • evaluate the model

read more in flow_from_dataframe() docs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM