简体   繁体   中英

Affect of image width and height on transfer learning model accuracy

I have almost 1000 images of 4 classes of 1280x720 pixel image of people performing certain hand gestures. The idea is to use transfer learning.

Below is code which uses Inceptioon with target image size is 640,360.

from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
import os
path = 'E:/build/set_1/training'
# Get count of number of files in this folder and all subfolders
def get_num_files(path):
  if not os.path.exists(path):
    return 0
  return sum([len(files) for r, d, files in os.walk(path)])

# Get count of number of subfolders directly below the folder in path
def get_num_subfolders(path):
  if not os.path.exists(path):
    return 0
  return sum([len(d) for r, d, files in os.walk(path)])
print(get_num_files(path))
print(get_num_subfolders(path))
def create_img_generator():
  return  ImageDataGenerator(
      preprocessing_function=preprocess_input,
      rotation_range=30,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True
  )
Image_width, Image_height = 640,360
Training_Epochs = 7
Batch_Size = 32
Number_FC_Neurons = 1024

train_dir = 'Desktop/Dataset/training'
validate_dir = 'Desktop/Dataset/validation'
num_train_samples = get_num_files(train_dir) 
num_classes = get_num_subfolders(train_dir)
num_validate_samples = get_num_files(validate_dir)
num_epoch = Training_Epochs
batch_size = Batch_Size
train_image_gen = create_img_generator()
test_image_gen = create_img_generator()

#   Connect the image generator to a folder contains the source images the image generator alters.  
#   Training image generator
train_generator = train_image_gen.flow_from_directory(
  train_dir,
  target_size=(Image_width, Image_height),
  batch_size=batch_size,
  seed = 42    #set seed for reproducability
)
validation_generator = test_image_gen.flow_from_directory(
  validate_dir,
  target_size=(Image_width, Image_height),
  batch_size=batch_size,
  seed=42       #set seed for reproducability
)
InceptionV3_base_model = InceptionV3(weights='imagenet', include_top=False) #include_top=False excludes final FC layer
print('Inception v3 base model without last FC loaded')
#print(InceptionV3_base_model.summary())     # display the Inception V3 model hierarchy

# Define the layers in the new classification prediction 
x = InceptionV3_base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(Number_FC_Neurons, activation='relu')(x)        # new FC layer, random init
predictions = Dense(num_classes, activation='softmax')(x)  # new softmax layer

# Define trainable model which links input from the Inception V3 base model to the new classification prediction layers
model = Model(inputs=InceptionV3_base_model.input, outputs=predictions)

# print model structure diagram
print (model.summary())
print ('\nPerforming Transfer Learning')
  #   Freeze all layers in the Inception V3 base model 
for layer in InceptionV3_base_model.layers:
  layer.trainable = False
#   Define model compile for basic Transfer Learning
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the transfer learning model to the data from the generators.  
# By using generators we can ask continue to request sample images and the generators will pull images from 
# the training or validation folders and alter them slightly
history_transfer_learning = model.fit_generator(
  train_generator,
  epochs=num_epoch,
  steps_per_epoch = num_train_samples // batch_size,
  validation_data=validation_generator,
  validation_steps = num_validate_samples // batch_size)

# Save transfer learning model
model.save('inceptionv3-original-image-transfer-learning.model')

the accuracy on 7 epochs is 84%

the accuracy is 86% on 7 epochs if target image size 200,113

How does image size affect the accuracy, and what size of image should be used to make this model more accurate.

The imagenet models, regardless of the framework that you are using, are trained on smaller sizes (224x224 ---> 299x299).

Now, indeed for object detection and image segmentation, in principle you can benefit from higher resolutions, since smaller objects can be better detected. There are also specific architectures which tackle this problem by a smarter feature reusage, but that is beside the point of the question.

It may be the case that, when your network is training on smaller images and you have a classification problem, by increasing the resolution you don't actually improve the results. Actually, for this problem of hand gestures, it may be the case that the network learns 'harder' the gestures due to the increased feature set/complexity that is derived with the bigger resolution.

If you obtain better results with smaller resolutions, that is not a problem; just ensure that when you test your model on your test set/in real life, you need to keep the same distribution of the images(the real life images need to be in the same statistical distribution like the local training+val+test).

The truth is that you need to iterate through several combinations of resolutions and check which one is more suitable in your case; the only thing to keep in mind is to maintain the aspect ratio, to avoid introducing artefacts/distortions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM