Model accuracy and loss not improving in CNN

Question

I am using the below LeNet architecture to train my image classification model, I have noticed that both train, val accuracy not improving for each iteration. Can any one expertise in this area explain what might have gone wrong?

training samples - 110 images belonging to 2 classes. validation - 50 images belonging to 2 classes.

#LeNet

import keras 
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

#import dropout class if needed
from keras.layers import Dropout

from keras import regularizers

model = Sequential()
#Layer 1
#Conv Layer 1
model.add(Conv2D(filters = 6, 
                 kernel_size = 5, 
                 strides = 1, 
                 activation = 'relu', 
                 input_shape = (32,32,3)))
#Pooling layer 1
model.add(MaxPooling2D(pool_size = 2, strides = 2))
#Layer 2
#Conv Layer 2
model.add(Conv2D(filters = 16, 
                 kernel_size = 5,
                 strides = 1,
                 activation = 'relu',
                 input_shape = (14,14,6)))
#Pooling Layer 2
model.add(MaxPooling2D(pool_size = 2, strides = 2))
#Flatten
model.add(Flatten())
#Layer 3
#Fully connected layer 1
model.add(Dense(units=128,activation='relu',kernel_initializer='uniform'
                     ,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))
#Layer 4
#Fully connected layer 2
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
                     ,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))

#layer 5
#Fully connected layer 3
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
                     ,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))

#layer 6
#Fully connected layer 4
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
                     ,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))

#Layer 7
#Output Layer
model.add(Dense(units = 2, activation = 'softmax'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

from keras.preprocessing.image import ImageDataGenerator

#Image Augmentation
train_datagen = ImageDataGenerator(
        rescale=1./255, #rescaling pixel value bw 0 and 1
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

#Just Feature scaling
test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory(
       '/Dataset/Skin_cancer/training',
        target_size=(32, 32),
        batch_size=32,
        class_mode='categorical')

test_set = test_datagen.flow_from_directory(
        '/Dataset/Skin_cancer/testing',
        target_size=(32, 32),
        batch_size=32,
        class_mode='categorical')

model.fit_generator(
        training_set,
        steps_per_epoch=50,   #number of input (image)
        epochs=25,
        validation_data=test_set,
        validation_steps=10)          # number of training sample

Epoch 1/25
50/50 [==============================] - 52s 1s/step - loss: 0.8568 - accuracy: 0.4963 - val_loss: 0.7004 - val_accuracy: 0.5000
Epoch 2/25
50/50 [==============================] - 50s 1s/step - loss: 0.6940 - accuracy: 0.5000 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 3/25
50/50 [==============================] - 48s 967ms/step - loss: 0.6932 - accuracy: 0.5065 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 4/25
50/50 [==============================] - 50s 1s/step - loss: 0.6932 - accuracy: 0.4824 - val_loss: 0.6933 - val_accuracy: 0.5000
Epoch 5/25
50/50 [==============================] - 49s 974ms/step - loss: 0.6932 - accuracy: 0.4949 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 6/25
50/50 [==============================] - 51s 1s/step - loss: 0.6932 - accuracy: 0.4854 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 7/25
50/50 [==============================] - 49s 976ms/step - loss: 0.6931 - accuracy: 0.5015 - val_loss: 0.6918 - val_accuracy: 0.5000
Epoch 8/25
50/50 [==============================] - 51s 1s/step - loss: 0.6932 - accuracy: 0.4986 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 9/25
50/50 [==============================] - 49s 973ms/step - loss: 0.6932 - accuracy: 0.5000 - val_loss: 0.6929 - val_accuracy: 0.5000
Epoch 10/25
50/50 [==============================] - 50s 1s/step - loss: 0.6931 - accuracy: 0.5044 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 11/25
50/50 [==============================] - 49s 976ms/step - loss: 0.6931 - accuracy: 0.5022 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 12/25

Answer 1

Remove all kernel_initializer='uniform' arguments from your layers; don't specify anything here, the default initializer glorot_uniform is the highly recommended one (and the uniform is a particularly bad one).

As a general rule, keep in mind that the default values for such rather advanced settings are there for your convenience, they are implicitly recommended, and you should better not mess with them unless you have specific reasons to do so and you know exactly what you are doing.

For the kernel_initializer argument in particular, I have started believing that it has caused a lot of unnecessary pain to people (just see here for the most recent example).

Also, dropout should not be used by default, especially in cases like here where the model seems to struggle to learn anything; start without any dropout (comment out the respective layers), and only add it back if you see signs of overfitting.

Answer 2

Most importantly is that you are using loss = 'categorical_crossentropy' , change it to loss = 'binary_crossentropy' as you have just 2 classes. And also change class_mode='categorical' to class_mode='binary' in flow_from_directory .

As @desertnaut rightly mentioned, categorical_crossentropy goes hand in hand with softmax activation in the last layer, and if you change the loss to binary_crossentropy the last activation should also be changed to sigmoid .

Other Improvements:

You have very limited data (160 images) and you have used almost 50% of data as validation data.
As you are building the model for image classification, you just have two Conv2D Layer and 4 dense Layer. The Dense layers are adding huge amount of weights to be learnt. Add few more conv2d layer and reduce the Dense layer.
Set batch_size = 1 and remove steps_per_epoch. As you have very less input let every epoch have same number of steps as input records.
Use the default glorot_uniform kernel initializer.
To further tune your model, build model using multiple Conv2D layer, followed by GlobalAveragePooling2D layer and FC Layer and final softmax layer.
Use Data Augmentation technique like horizontal_flip , vertical_flip , shear_range , zoom_range of ImageDataGenerator to increase the number of training and validation images.

Moving the comments to answer section as suggested by @desertnaut -

Question - Thanks, Yes. less data is the problem I figured, One additional question - why is that adding more dense layer than conv layer negatively affecting the model? is there any rule to follow when we decide how many conv and dense layer we gonna use ? – Arun_Ramji_Shanmugam 2 days ago

Answer - To answer the first part of your question, Conv2D layer maintains the spatial information of the image and weights to be learnt depend on the kernel size and stride mentioned in the layer,where as the Dense layer needs the output of Conv2D to be flattened and used further hence losing the spatial information. Also dense layer adds more number of weights, for example 2 dense layers of 512 adds (512*512)=262144 params or weights to the model(has to be learnt by the model).That means you have to train for more number of epochs and with good hype parameters settings for learning of these weights. – Tensorflow Warriors 2 days ago

Answer - To answer the second part of your question,use systematic experiments to discover what works best for your specific dataset. Also it depends on processing power you hold. Remember, deeper networks is always better, at the cost of more data and increased complexity of learning. A conventional approach is to look for similar problems and deep learning architectures which have already been shown to work. Also we have the flexibility to utilize the pretrained models like resnet, vgg etc, use these models by freezing the part of the layers and training on remaining layers. – Tensorflow Warriors 2 days ago

Question - Thank you for detailed answer,? If you don't bother one more question - so when we are using already trained model (may be some layers) , isn't it required to be trained on same input data as the one we gonna work ? – Arun_Ramji_Shanmugam yesterday

Answer - The intuition behind transfer learning for image classification is that if a model is trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world. You can find transfer learning example with explanation here - tensorflow.org/tutorials/images/transfer_learning. – Tensorflow Warriors yesterday

Model accuracy and loss not improving in CNN

Question

2 answers

solution1
4 2020-04-29 12:26:23

solution2
3 ACCPTED 2020-05-05 09:30:32

Model accuracy and loss not improving in CNN

Question

2 answers

solution1 4 2020-04-29 12:26:23

solution2 3 ACCPTED 2020-05-05 09:30:32

solution1
4 2020-04-29 12:26:23

solution2
3 ACCPTED 2020-05-05 09:30:32