简体   繁体   English

验证准确性/损失随着每个连续的时期线性上升和下降

[英]Validation accuracy/loss goes up and down linearly with every consecutive epoch

I'm training a CNN in keras with tensorflow backend with the following model architecture for a binary classification problem. 我正在使用tensorflow后端在keras中训练CNN,并使用以下模型架构解决二进制分类问题。 I've divided approximately 41k images into training, validation and test sets in the ratio 70:25:5 giving 29k images in train set, 10k in validation and 2k in test set. 我已按70:25:5的比例将大约41k图像分为训练,验证和测试集,从而在训练集中获得29k图像,在验证中获得10k,在测试集中获得2k。

There is no class imbalance, there were approximately 20k samples in each of pos and neg classes. 没有类别失衡,每个posneg类别中大约有2万个样本。

model = Sequential()
model.add(Conv2D(32, (7, 7), padding = 'same', input_shape=input_shape))
model.add(Conv2D(32, (7, 7), padding = 'same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.5))

model.add(Conv2D(32, (7, 7), padding = 'same'))
model.add(Conv2D(32, (7, 7), padding = 'same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.6))

model.add(Conv2D(32, (7, 7), padding = 'same'))
model.add(Conv2D(32, (7, 7), padding = 'same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.6))

model.add(Conv2D(64, (7, 7), padding = 'same'))
model.add(Conv2D(64, (7, 7), padding = 'same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.6))

model.add(Conv2D(64, (7, 7), padding = 'same'))
model.add(Conv2D(64, (7, 7), padding = 'same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.6))

model.add(Conv2D(64, (7, 7), padding = 'same'))
model.add(Conv2D(64, (7, 7), padding = 'same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.6))

model.add(Conv2D(128, (7, 7), padding = 'same'))
model.add(Conv2D(128, (7, 7), padding = 'same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.6))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))

model.add(Dense(512))
model.add(Activation('relu'))

model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer=optimizers.Adam(lr=3e-5),
              metrics=['accuracy'])

checkpoint = ModelCheckpoint(filepath='checkpointORCA_adam-{epoch:02d}-{val_loss:.2f}.h5', monitor='val_loss', verbose=0, save_best_only=True)

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5,
                              patience=20, min_lr=1e-8)

train_datagen = ImageDataGenerator(rescale=1. / 255,
        shear_range=0.2,
        zoom_range=0.2)

# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1. / 255)

# Change the batchsize according to your system RAM
train_batchsize = 32 # changed them to 64 and 128 respectively, but same 
                      results
val_batchsize = 32

train_generator = train_datagen.flow_from_directory(
    train_data_path,
    target_size=(img_width, img_height),
    batch_size=train_batchsize,
    class_mode='binary',
shuffle=True)

# train_generator.reset()
# validation_generator.reset()
validation_generator = test_datagen.flow_from_directory(
    validation_data_path,
    target_size=(img_width, img_height),
    batch_size=val_batchsize,
    class_mode='binary',
shuffle=False)

# validation_generator.reset()

history = model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size,
    callbacks=[checkpoint, reduce_lr])

These are the epochs for the training progress, where the validation accuracy flucatuates in a linear fashion. 这些是训练进度的时期,其中验证准确性以线性方式波动。 It first gets high and then low by nearly the same amount. 它先变高然后变低几乎相同的数量。 What could be the reason for this? 这可能是什么原因?

I've checked out nearly every other answer for this, and my data is normalized, properly shuffled in the training set, lr is small and very well checked in within what other researchers in similar problem domain found success with. 我几乎已经检查出所有其他答案,并且我的数据已标准化,在训练集中正确地进行了混洗,lr很小,并且在类似问题领域的其他研究人员发现成功的范围内得到了很好的检查。

Found 29124 images belonging to 2 classes.
Found 10401 images belonging to 2 classes.
Epoch 1/60
910/910 [==============================] - 530s 582ms/step - loss: 0.6105 - acc: 0.6161 - val_loss: 0.2298 - val_acc: 0.9548
Epoch 2/60
910/910 [==============================] - 520s 571ms/step - loss: 0.3590 - acc: 0.8480 - val_loss: 0.8340 - val_acc: 0.6604
Epoch 3/60
910/910 [==============================] - 520s 571ms/step - loss: 0.3160 - acc: 0.8695 - val_loss: 0.0983 - val_acc: 0.9558
Epoch 4/60
910/910 [==============================] - 528s 580ms/step - loss: 0.2925 - acc: 0.8830 - val_loss: 0.5063 - val_acc: 0.8385
Epoch 5/60
910/910 [==============================] - 529s 581ms/step - loss: 0.2718 - acc: 0.8895 - val_loss: 0.0541 - val_acc: 0.9745
Epoch 6/60
910/910 [==============================] - 530s 583ms/step - loss: 0.2523 - acc: 0.8982 - val_loss: 0.5849 - val_acc: 0.8060
Epoch 7/60
910/910 [==============================] - 528s 580ms/step - loss: 0.2368 - acc: 0.9076 - val_loss: 0.0682 - val_acc: 0.9695
Epoch 8/60
910/910 [==============================] - 529s 582ms/step - loss: 0.2168 - acc: 0.9160 - val_loss: 0.6503 - val_acc: 0.7660
Epoch 9/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1996 - acc: 0.9213 - val_loss: 0.0339 - val_acc: 0.9850
Epoch 10/60
910/910 [==============================] - 529s 581ms/step - loss: 0.1896 - acc: 0.9258 - val_loss: 0.5710 - val_acc: 0.8033
Epoch 11/60
910/910 [==============================] - 529s 581ms/step - loss: 0.1814 - acc: 0.9285 - val_loss: 0.0391 - val_acc: 0.9834
Epoch 12/60
910/910 [==============================] - 529s 581ms/step - loss: 0.1715 - acc: 0.9342 - val_loss: 0.6787 - val_acc: 0.7792
Epoch 13/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1678 - acc: 0.9361 - val_loss: 0.0451 - val_acc: 0.9796
Epoch 14/60
910/910 [==============================] - 529s 581ms/step - loss: 0.1683 - acc: 0.9356 - val_loss: 0.7874 - val_acc: 0.7306
Epoch 15/60
910/910 [==============================] - 528s 580ms/step - loss: 0.1618 - acc: 0.9387 - val_loss: 0.0483 - val_acc: 0.9761
Epoch 16/60
910/910 [==============================] - 528s 581ms/step - loss: 0.1569 - acc: 0.9398 - val_loss: 0.9105 - val_acc: 0.7060
Epoch 17/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1566 - acc: 0.9397 - val_loss: 0.0380 - val_acc: 0.9853
Epoch 18/60
910/910 [==============================] - 529s 581ms/step - loss: 0.1506 - acc: 0.9416 - val_loss: 0.7649 - val_acc: 0.7435
Epoch 19/60
910/910 [==============================] - 527s 580ms/step - loss: 0.1497 - acc: 0.9429 - val_loss: 0.0507 - val_acc: 0.9778
Epoch 20/60
910/910 [==============================] - 529s 581ms/step - loss: 0.1476 - acc: 0.9439 - val_loss: 0.7189 - val_acc: 0.7665
Epoch 21/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1426 - acc: 0.9447 - val_loss: 0.0377 - val_acc: 0.9873
Epoch 22/60
910/910 [==============================] - 528s 580ms/step - loss: 0.1407 - acc: 0.9463 - val_loss: 0.7066 - val_acc: 0.7817
Epoch 23/60
910/910 [==============================] - 526s 578ms/step - loss: 0.1427 - acc: 0.9444 - val_loss: 0.0376 - val_acc: 0.9877
Epoch 24/60
910/910 [==============================] - 528s 580ms/step - loss: 0.1373 - acc: 0.9467 - val_loss: 0.6619 - val_acc: 0.8023
Epoch 25/60
910/910 [==============================] - 528s 580ms/step - loss: 0.1362 - acc: 0.9466 - val_loss: 0.0457 - val_acc: 0.9844
Epoch 26/60
910/910 [==============================] - 529s 582ms/step - loss: 0.1350 - acc: 0.9474 - val_loss: 0.8683 - val_acc: 0.7046
Epoch 27/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1339 - acc: 0.9492 - val_loss: 0.0411 - val_acc: 0.9855
Epoch 28/60
910/910 [==============================] - 529s 581ms/step - loss: 0.1339 - acc: 0.9499 - val_loss: 0.9552 - val_acc: 0.6762
Epoch 29/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1343 - acc: 0.9488 - val_loss: 0.0446 - val_acc: 0.9859
Epoch 30/60
910/910 [==============================] - 528s 580ms/step - loss: 0.1282 - acc: 0.9513 - val_loss: 0.8127 - val_acc: 0.7298
Epoch 31/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1286 - acc: 0.9504 - val_loss: 0.0484 - val_acc: 0.9857
Epoch 32/60
910/910 [==============================] - 529s 581ms/step - loss: 0.1258 - acc: 0.9506 - val_loss: 0.5007 - val_acc: 0.8479
Epoch 33/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1301 - acc: 0.9495 - val_loss: 0.0467 - val_acc: 0.9859
Epoch 34/60
910/910 [==============================] - 529s 581ms/step - loss: 0.1253 - acc: 0.9516 - val_loss: 0.6061 - val_acc: 0.8056
Epoch 35/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1259 - acc: 0.9521 - val_loss: 0.0469 - val_acc: 0.9873
Epoch 36/60
910/910 [==============================] - 528s 580ms/step - loss: 0.1249 - acc: 0.9511 - val_loss: 0.8658 - val_acc: 0.7121
Epoch 37/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1206 - acc: 0.9548 - val_loss: 0.0459 - val_acc: 0.9869
Epoch 38/60
910/910 [==============================] - 527s 580ms/step - loss: 0.1229 - acc: 0.9512 - val_loss: 0.4516 - val_acc: 0.8646
Epoch 39/60
910/910 [==============================] - 527s 579ms/step - loss: 0.1206 - acc: 0.9528 - val_loss: 0.0469 - val_acc: 0.9861
Epoch 40/60

The below graph is not for this problem but similar situation of what I'm asking about: 下图不是针对此问题的,而是与我所询问的情况类似的情况:

失利

acc

A couple of things I tried to get the curve as below: 我尝试了几件事以使曲线如下:

在此处输入图片说明

  • Reduced the model's capacity to a few layers with few neurons. 将模型的容量减少到很少神经元的几层。
  • Lowered the learning rate and increased my batch size from 32 to 128. 降低了学习速度并将我的批量大小从32增加到128。

This may not be the only possible solution, but I present a few easy tips to get things right. 这可能不是唯一的解决方案,但是我提出了一些简单的技巧来使事情变得正确。 Helpful links: 1 and 2 有用的链接: 12

I am also planning to train for more epochs. 我还计划训练更多的时代。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么在 CNN 迁移学习期间,我的损失和准确率会随着每个 epoch 不断上升和下降? - Why does my loss and accuracy keep going up and down with each epoch during CNN transfer learning? 在回归 model 的一个时期内,训练损失会上下波动。 此外,model 在测试集上的性能相对较差(平均误差约为 40%) - Training loss goes up and down during an epoch in a regression model. Also, model performance on a test set is relatively poor (~40% average error) 将每个批次或时期的验证准确性打印到控制台(Keras) - Printing out the validation accuracy to the console for every batch or epoch (Keras) 计算 PyTorch 中每个 epoch 的精度 - Calculate the accuracy every epoch in PyTorch 每个 epoch 的准确度都是从 0 而不是 1 开始 - Accuracy for every epoch is starting with 0 instead of 1 PyTorch ConvNet 不工作。 损失下降,因为准确率保持在 %14 左右 - PyTorch ConvNet not working. Loss goes down as accuracy stays about %14 在我的 RNN Model 中的每个时期,验证损失都在增加,验证准确度在降低 - Validation Loss is increase and validation accuracy is decrese on every epochs in my RNN Model DC-GAN:鉴别器损耗上升而发电机损耗下降 - DC-GAN: Discriminator loss going up while generator loss goes down Keras - 验证损失和准确性停留在 0 - Keras - Validation Loss and Accuracy stuck at 0 绘制训练和验证的准确性和损失 - Plot training and validation accuracy and loss
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM