简体   繁体   English

Model CNN 中的准确度和损失没有改善

[英]Model accuracy and loss not improving in CNN

I am using the below LeNet architecture to train my image classification model, I have noticed that both train, val accuracy not improving for each iteration.我正在使用下面的 LeNet 架构来训练我的图像分类 model,我注意到每次迭代都没有提高训练、验证精度。 Can any one expertise in this area explain what might have gone wrong?该领域的任何一位专业人士都可以解释可能出了什么问题吗?

training samples - 110 images belonging to 2 classes.训练样本 - 属于 2 个类别的 110 张图像。 validation - 50 images belonging to 2 classes.验证 - 属于 2 个类别的 50 张图像。

#LeNet

import keras 
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense

#import dropout class if needed
from keras.layers import Dropout

from keras import regularizers

model = Sequential()
#Layer 1
#Conv Layer 1
model.add(Conv2D(filters = 6, 
                 kernel_size = 5, 
                 strides = 1, 
                 activation = 'relu', 
                 input_shape = (32,32,3)))
#Pooling layer 1
model.add(MaxPooling2D(pool_size = 2, strides = 2))
#Layer 2
#Conv Layer 2
model.add(Conv2D(filters = 16, 
                 kernel_size = 5,
                 strides = 1,
                 activation = 'relu',
                 input_shape = (14,14,6)))
#Pooling Layer 2
model.add(MaxPooling2D(pool_size = 2, strides = 2))
#Flatten
model.add(Flatten())
#Layer 3
#Fully connected layer 1
model.add(Dense(units=128,activation='relu',kernel_initializer='uniform'
                     ,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))
#Layer 4
#Fully connected layer 2
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
                     ,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))

#layer 5
#Fully connected layer 3
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
                     ,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))

#layer 6
#Fully connected layer 4
model.add(Dense(units=64,activation='relu',kernel_initializer='uniform'
                     ,kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(rate=0.2))

#Layer 7
#Output Layer
model.add(Dense(units = 2, activation = 'softmax'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

from keras.preprocessing.image import ImageDataGenerator

#Image Augmentation
train_datagen = ImageDataGenerator(
        rescale=1./255, #rescaling pixel value bw 0 and 1
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

#Just Feature scaling
test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory(
       '/Dataset/Skin_cancer/training',
        target_size=(32, 32),
        batch_size=32,
        class_mode='categorical')

test_set = test_datagen.flow_from_directory(
        '/Dataset/Skin_cancer/testing',
        target_size=(32, 32),
        batch_size=32,
        class_mode='categorical')

model.fit_generator(
        training_set,
        steps_per_epoch=50,   #number of input (image)
        epochs=25,
        validation_data=test_set,
        validation_steps=10)          # number of training sample

Epoch 1/25
50/50 [==============================] - 52s 1s/step - loss: 0.8568 - accuracy: 0.4963 - val_loss: 0.7004 - val_accuracy: 0.5000
Epoch 2/25
50/50 [==============================] - 50s 1s/step - loss: 0.6940 - accuracy: 0.5000 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 3/25
50/50 [==============================] - 48s 967ms/step - loss: 0.6932 - accuracy: 0.5065 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 4/25
50/50 [==============================] - 50s 1s/step - loss: 0.6932 - accuracy: 0.4824 - val_loss: 0.6933 - val_accuracy: 0.5000
Epoch 5/25
50/50 [==============================] - 49s 974ms/step - loss: 0.6932 - accuracy: 0.4949 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 6/25
50/50 [==============================] - 51s 1s/step - loss: 0.6932 - accuracy: 0.4854 - val_loss: 0.6931 - val_accuracy: 0.5000
Epoch 7/25
50/50 [==============================] - 49s 976ms/step - loss: 0.6931 - accuracy: 0.5015 - val_loss: 0.6918 - val_accuracy: 0.5000
Epoch 8/25
50/50 [==============================] - 51s 1s/step - loss: 0.6932 - accuracy: 0.4986 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 9/25
50/50 [==============================] - 49s 973ms/step - loss: 0.6932 - accuracy: 0.5000 - val_loss: 0.6929 - val_accuracy: 0.5000
Epoch 10/25
50/50 [==============================] - 50s 1s/step - loss: 0.6931 - accuracy: 0.5044 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 11/25
50/50 [==============================] - 49s 976ms/step - loss: 0.6931 - accuracy: 0.5022 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 12/25

每次迭代的准确性

每次迭代的损失

Remove all kernel_initializer='uniform' arguments from your layers;从您的图层中删除所有kernel_initializer='uniform' arguments; don't specify anything here, the default initializer glorot_uniform is the highly recommended one (and the uniform is a particularly bad one).不要在此处指定任何内容,默认初始化程序glorot_uniform是强烈推荐的初始化程序( uniform是特别糟糕的初始化程序)。

As a general rule, keep in mind that the default values for such rather advanced settings are there for your convenience, they are implicitly recommended, and you should better not mess with them unless you have specific reasons to do so and you know exactly what you are doing.作为一般规则,请记住,此类相当高级设置的默认值是为了您的方便而存在的,它们是隐式推荐的,除非您有特定的理由这样做并且您确切地知道自己要做什么,否则您最好不要乱用它们是做。

For the kernel_initializer argument in particular, I have started believing that it has caused a lot of unnecessary pain to people (just see here for the most recent example).特别是对于kernel_initializer参数,我已经开始相信它给人们带来了很多不必要的痛苦(请参阅此处的最新示例)。

Also, dropout should not be used by default, especially in cases like here where the model seems to struggle to learn anything;此外,默认情况下不应使用 dropout,尤其是在 model 似乎很难学习任何东西的情况下; start without any dropout (comment out the respective layers), and only add it back if you see signs of overfitting.开始时没有任何 dropout(注释掉相应的层),并且只有在看到过度拟合的迹象时才将其添加回来。

Most importantly is that you are using loss = 'categorical_crossentropy' , change it to loss = 'binary_crossentropy' as you have just 2 classes.最重要的是您正在使用loss = 'categorical_crossentropy' ,将其更改为loss = 'binary_crossentropy'因为您只有 2 个类。 And also change class_mode='categorical' to class_mode='binary' in flow_from_directory .并在class_mode='categorical'更改为flow_from_directory class_mode='binary'

As @desertnaut rightly mentioned, categorical_crossentropy goes hand in hand with softmax activation in the last layer, and if you change the loss to binary_crossentropy the last activation should also be changed to sigmoid .正如@desertnaut 正确提到的那样, categorical_crossentropy与最后一层中的softmax激活密切相关,如果您将损失更改为binary_crossentropy ,则最后一次激活也应更改为sigmoid

Other Improvements:其他改进:

  1. You have very limited data (160 images) and you have used almost 50% of data as validation data.您的数据非常有限(160 张图像),并且您使用了近 50% 的数据作为验证数据。
  2. As you are building the model for image classification, you just have two Conv2D Layer and 4 dense Layer.当您构建 model 用于图像分类时,您只有两个 Conv2D 层和 4 个密集层。 The Dense layers are adding huge amount of weights to be learnt.密集层正在增加大量需要学习的权重。 Add few more conv2d layer and reduce the Dense layer.添加更多的 conv2d 层并减少 Dense 层。
  3. Set batch_size = 1 and remove steps_per_epoch.设置 batch_size = 1 并删除 steps_per_epoch。 As you have very less input let every epoch have same number of steps as input records.由于您的输入非常少,因此每个时期都具有与输入记录相同的步数。
  4. Use the default glorot_uniform kernel initializer.使用默认的 glorot_uniform kernel 初始化程序。
  5. To further tune your model, build model using multiple Conv2D layer, followed by GlobalAveragePooling2D layer and FC Layer and final softmax layer.要进一步调整 model,请使用多个 Conv2D 层构建 model,然后是 GlobalAveragePooling2D 层和 FC 层以及最终的 softmax 层。
  6. Use Data Augmentation technique like horizontal_flip , vertical_flip , shear_range , zoom_range of ImageDataGenerator to increase the number of training and validation images.使用 ImageDataGenerator 的horizontal_flip翻转、 vertical_flip shear_range 、剪切范围、缩放范围等数据增强技术来增加训练和验证图像的数量。

Moving the comments to answer section as suggested by @desertnaut -按照@desertnaut 的建议将评论移至答案部分-

Question - Thanks, Yes.问题 - 谢谢,是的。 less data is the problem I figured, One additional question - why is that adding more dense layer than conv layer negatively affecting the model?更少的数据是我认为的问题,还有一个问题 - 为什么添加比 conv 层更密集的层会对 model 产生负面影响? is there any rule to follow when we decide how many conv and dense layer we gonna use ?当我们决定要使用多少个卷积层和密集层时,有什么规则可以遵循吗? – Arun_Ramji_Shanmugam 2 days ago – Arun_Ramji_Shanmugam 2 天前

Answer - To answer the first part of your question, Conv2D layer maintains the spatial information of the image and weights to be learnt depend on the kernel size and stride mentioned in the layer,where as the Dense layer needs the output of Conv2D to be flattened and used further hence losing the spatial information.回答 - 要回答您问题的第一部分,Conv2D 层维护图像的空间信息,要学习的权重取决于层中提到的 kernel 大小和步幅,其中密集层需要将 Conv2D 的 output 展平并进一步使用,因此丢失了空间信息。 Also dense layer adds more number of weights, for example 2 dense layers of 512 adds (512*512)=262144 params or weights to the model(has to be learnt by the model).That means you have to train for more number of epochs and with good hype parameters settings for learning of these weights.密集层也增加了更多的权重,例如 2 个 512 的密集层向模型添加 (512*512)=262144 参数或权重(必须由模型学习)。这意味着你必须训练更多数量的epochs 和良好的炒作参数设置来学习这些权重。 – Tensorflow Warriors 2 days ago – Tensorflow 勇士 2 天前

Answer - To answer the second part of your question,use systematic experiments to discover what works best for your specific dataset.答案 - 要回答问题的第二部分,请使用系统实验来发现最适合您的特定数据集的方法。 Also it depends on processing power you hold.这也取决于您拥有的处理能力。 Remember, deeper networks is always better, at the cost of more data and increased complexity of learning.请记住,更深的网络总是更好,但代价是更多的数据和学习的复杂性增加。 A conventional approach is to look for similar problems and deep learning architectures which have already been shown to work.一种传统的方法是寻找已经证明有效的类似问题和深度学习架构。 Also we have the flexibility to utilize the pretrained models like resnet, vgg etc, use these models by freezing the part of the layers and training on remaining layers.我们还可以灵活地利用预训练模型,如 resnet、vgg 等,通过冻结部分层并在剩余层上进行训练来使用这些模型。 – Tensorflow Warriors 2 days ago – Tensorflow 勇士 2 天前

Question - Thank you for detailed answer,?问题 - 谢谢你的详细回答,? If you don't bother one more question - so when we are using already trained model (may be some layers) , isn't it required to be trained on same input data as the one we gonna work ?如果您不打扰另一个问题 - 那么当我们使用已经训练过的 model (可能是一些层)时,是否需要在与我们要工作的输入数据相同的输入数据上进行训练? – Arun_Ramji_Shanmugam yesterday – Arun_Ramji_Shanmugam 昨天

Answer - The intuition behind transfer learning for image classification is that if a model is trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world.答案 - 图像分类迁移学习背后的直觉是,如果 model 在足够大且通用的数据集上进行训练,则此 model 将有效地用作视觉世界的通用 model。 You can find transfer learning example with explanation here - tensorflow.org/tutorials/images/transfer_learning.您可以在此处找到带有解释的迁移学习示例 - tensorflow.org/tutorials/images/transfer_learning。 – Tensorflow Warriors yesterday – Tensorflow 勇士昨天

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM