简体   繁体   中英

Val_acc doesn't increse

I'm trying to train a model with transfer learning, mobil.netv2, but my val accurary stops incressing at around 0,60. I've tried to train the top layers that I've build, after that I've tried to also train some of the mobil.nets layers. Same result. How can I fix it? I have to mention that I am new to deep learning and I am not sure that the top layers I've build are right. Feel free to correct me.

IMAGE_SIZE = 224
BATCH_SIZE = 64

train_data_dir = "/content/FER2013/Training"
validation_data_dir = "/content/FER2013/PublicTest"


datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255, 
    validation_split=0)

train_generator = datagen.flow_from_directory(
    train_data_dir,
    target_size=(IMAGE_SIZE, IMAGE_SIZE),
    class_mode = 'categorical',
    batch_size=BATCH_SIZE)

val_generator = datagen.flow_from_directory(
    validation_data_dir,
    target_size=(IMAGE_SIZE, IMAGE_SIZE),
    class_mode = 'categorical',
    batch_size=BATCH_SIZE)
IMG_SHAPE = (IMAGE_SIZE, IMAGE_SIZE, 3)

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')
model = tf.keras.Sequential([
   base_model,
  
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(512, activation='relu'),
  tf.keras.layers.Dense(256, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(64, activation='relu'),
  tf.keras.layers.BatchNormalization(),
  tf.keras.layers.Dense(32, activation='relu'),
  tf.keras.layers.Dense(16, activation='relu'),
  tf.keras.layers.Dense(7, activation='softmax')
])
model.compile(loss='categorical_crossentropy',
              optimizer = tf.keras.optimizers.Adam(1e-5), #I've tried with .Adam as well
              metrics=['accuracy'])
from keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint
lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=3)
early_stopper = EarlyStopping(monitor='val_accuracy', min_delta=0, patience=6, mode='auto')
checkpointer = ModelCheckpoint('/content/weights.hd5', monitor='val_loss', verbose=1, save_best_only=True)
epochs = 50
learning_rate = 0.004 #I've tried other values as well
history_fine = model.fit(train_generator, 
                         steps_per_epoch=len(train_generator), 
                         epochs=epochs, 
                         callbacks=[lr_reducer, checkpointer, early_stopper],
                         validation_data=val_generator, 
                         validation_steps=len(val_generator))
Epoch 1/50
448/448 [==============================] - ETA: 0s - loss: 1.7362 - accuracy: 0.2929
Epoch 00001: val_loss improved from inf to 1.58818, saving model to /content/weights.hd5
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 166s 370ms/step - loss: 1.7362 - accuracy: 0.2929 - val_loss: 1.5882 - val_accuracy: 0.4249
Epoch 2/50
448/448 [==============================] - ETA: 0s - loss: 1.3852 - accuracy: 0.4664
Epoch 00002: val_loss improved from 1.58818 to 1.31690, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 165s 368ms/step - loss: 1.3852 - accuracy: 0.4664 - val_loss: 1.3169 - val_accuracy: 0.4827
Epoch 3/50
448/448 [==============================] - ETA: 0s - loss: 1.2058 - accuracy: 0.5277
Epoch 00003: val_loss improved from 1.31690 to 1.21979, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 165s 368ms/step - loss: 1.2058 - accuracy: 0.5277 - val_loss: 1.2198 - val_accuracy: 0.5271
Epoch 4/50
448/448 [==============================] - ETA: 0s - loss: 1.0828 - accuracy: 0.5861
Epoch 00004: val_loss improved from 1.21979 to 1.18972, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 166s 370ms/step - loss: 1.0828 - accuracy: 0.5861 - val_loss: 1.1897 - val_accuracy: 0.5533
Epoch 5/50
448/448 [==============================] - ETA: 0s - loss: 0.9754 - accuracy: 0.6380
Epoch 00005: val_loss improved from 1.18972 to 1.13336, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 165s 368ms/step - loss: 0.9754 - accuracy: 0.6380 - val_loss: 1.1334 - val_accuracy: 0.5743
Epoch 6/50
448/448 [==============================] - ETA: 0s - loss: 0.8761 - accuracy: 0.6848
Epoch 00006: val_loss did not improve from 1.13336
448/448 [==============================] - 153s 342ms/step - loss: 0.8761 - accuracy: 0.6848 - val_loss: 1.1348 - val_accuracy: 0.5882
Epoch 7/50
448/448 [==============================] - ETA: 0s - loss: 0.7783 - accuracy: 0.7264
Epoch 00007: val_loss did not improve from 1.13336
448/448 [==============================] - 153s 341ms/step - loss: 0.7783 - accuracy: 0.7264 - val_loss: 1.1392 - val_accuracy: 0.5893
Epoch 8/50
448/448 [==============================] - ETA: 0s - loss: 0.6832 - accuracy: 0.7638
Epoch 00008: val_loss did not improve from 1.13336
448/448 [==============================] - 153s 342ms/step - loss: 0.6832 - accuracy: 0.7638 - val_loss: 1.1542 - val_accuracy: 0.6052

Since your validation loss is increasing while your training loss decreases, I think you may have a problem of overfitting to your training sets. Some things that could help are:

  • Use less dense layers. I think there are too many, but I could be wrong since I don't know what problem you are solving.
  • Add dropout layers after every dense layer.
  • Increase the dropout rates.
  • Use data augmentation on your training set (since you are already using ImageDataGenerator it won't be that hard).
  • Reduce the number of neurons in dense layers.
  • Use regularization.

You can try applying any of them or multiple of them at the same time. Tweaking your model is a lot of trial and error, do some experiments, and keep the model that achieves the best performance.

With your training loss decreasing and your validation loss increasing it appears that you are in an over fitting situation. The more complex your model the higher the chance of this happening so I suggest you try to simplify your model. Try removing all the dense layers except for the top layer which is producing your classifications. Run that and see how it does. My experience with MobileNet is that it should work well. If not add an additional dense layer with about 128 neurons followed by a drop out layer. Use the drop out value to adjust for over fitting. If you are still over fitting you might want to consider adding regularizers to this dense layer. Documentation is here. It could also be the case that you need more training samples. Use the Image Data Generator to augment your data set. For example set horizontal_flip-True. I notice you did not include the MobileNetV2 pre-processing function in the Image Data Generator. I believe this is necessary so modify the code as shown below. I have also found I get the best results with MobileNet if I train the entire model. So before you compile your model add the code below.

ImageDataGenerator(preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input)
for layer in model.layers:
    layer.trainable=True

As an aside you can delete that layer in you model and within the MobileNet set the parameter pooling='max'. I see you start out with a very small learning rate try something like.005. Since you have the ReduceLROnPlateau callback this will adjust it if it is to large but will allow you to converge faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM