VGG16微调

Question

I'm trying to fine tune VGG16.我正在尝试微调 VGG16。 But sometimes I got a validation accuracy that is constant, sometimes it is fixed to 0.0 and sometimes it is fixed to 1.0 and it is the same also on the test accuracy.但有时我得到一个恒定的验证准确度，有时它固定为 0.0，有时它固定为 1.0，在测试准确度上也是如此。 It also happened that the training is constant.也碰巧训练是不断的。

Those are some examples:这些是一些例子：

Adam, bs: 64, lr: 0.001亚当，bs：64，lr：0.001

train_acc = [0.45828044, 0.4580425, 0.45812184, 0.45820114, 0.45820114, 0.45812184, 0.45820114, 0.45820114, 0.45820114, 0.4580425, 0.45820114, 0.45820114, 0.45812184, 0.45828044, 0.45820114, 0.45828044, 0.45812184, 0.45820114, 0.45812184, 0.45828044, 0.45820114, 0.45820114, 0.45812184, 0.45812184, 0.45820114, 0.45812184, 0.45828044, 0.45820114, 0.45828044, 0.45812184, 0.45820114, 0.45820114, 0.45812184, 0.45820114, 0.45820114, 0.45820114, 0.45828044, 0.45812184, 0.45828044, 0.4580425, 0.4580425, 0.45820114, 0.45820114, 0.45820114, 0.45828044, 0.45820114, 0.45812184, 0.45820114, 0.45820114, 0.45820114]
valid_acc = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
train_loss = [8.31718591143032, 8.35966631966799, 8.358442889857413, 8.357219463677575, 8.357219470939055, 8.358442853550015, 8.357219473359548, 8.357219434631658, 8.357219487882508, 8.359666328139717, 8.357219499984973, 8.357219495143987, 8.35844288017544, 8.355996039918232, 8.357219415267712, 8.355996025395273, 8.358442889857413, 8.357219521769412, 8.358442892277907, 8.355996052020698, 8.35721946609807, 8.357219415267712, 8.35844288017544, 8.358442885016427, 8.357219463677575, 8.358442882595934, 8.355996003610834, 8.357219458836589, 8.355996064123163, 8.357520040521766, 8.357219487882508, 8.357219480621028, 8.358442897118893, 8.357219495143987, 8.357219446734124, 8.35721945157511, 8.355996056861684, 8.358442911641852, 8.355996047179712, 8.359666311196264, 8.359666286991333, 8.35721946609807, 8.357219458836589, 8.35721944431363, 8.355996035077245, 8.357219453995603, 8.358442909221358, 8.357219439472644, 8.357219429790671, 8.357219461257083]
valid_loss = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

test_loss = 0.0
test_acc = 1.0

RMSprop, bs: 64, lr: 0.001 RMSprop，bs：64，lr：0.001

train_acc = [0.5421161, 0.54179883, 0.54179883, 0.54171956, 0.54171956, 0.5419575, 0.54187816, 0.54179883, 0.54187816, 0.5419575, 0.5419575]
valid_acc = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
train_loss = [6.990036433118249, 7.025707591003573, 7.025707559537161, 7.026923776278036, 7.02692376054483, 7.023275266444017, 7.024491474713166, 7.025707566798641, 7.024491443246754, 7.023275273705497, 7.0232752761259905]
valid_loss = [15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457]

test_loss = 15.33323860168457
test_acc = 0.0

SDG, bs: 64, lr: 0.01, momentum: 0.2 SDG，bs：64，lr：0.01，动量：0.2

train_acc = [0.5406091, 0.5419575, 0.54187816, 0.54179883, 0.54187816, 0.54187816, 0.54187816, 0.54187816, 0.54179883, 0.54171956, 0.54179883]
valid_acc = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
train_loss = [6.990036433118249, 7.025707591003573, 7.025707559537161, 7.026923776278036, 7.02692376054483, 7.023275266444017, 7.024491474713166, 7.025707566798641, 7.024491443246754, 7.023275273705497, 7.0232752761259905]
valid_loss = [15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457, 15.33323860168457]

test_loss = 15.33323860168457
test_acc = 0.0

SDG, bs: 64, lr: 0.01, momentum: 0.4 SDG，bs：64，lr：0.01，动量：0.4

train_acc = [0.45740798, 0.45828044, 0.45820114, 0.45828044, 0.45820114, 0.4580425, 0.45820114, 0.45820114, 0.45820114, 0.45820114, 0.45820114]
valid_acc = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
train_loss = [8.329831461313413, 8.355996044759218, 8.357219475780042, 8.355996035077245, 8.357219502405467, 8.35966631603725, 8.357219461257083, 8.357219461257083, 8.357219456416097, 8.357219441893138, 8.357219478200534]
valid_loss = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

test_loss = 0.0
test_acc = 1.0

For the fine tuning I've used the following top layers:对于微调，我使用了以下顶层：

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

Do you have some idea of why this happen?你知道为什么会这样吗？

Anyway I'm still trying to train the network, but often the training accuracy increases and the validation accuracy behave in a very chaotic way, varying a lot from one epoch to another.无论如何，我仍在尝试训练网络，但训练准确度通常会增加，验证准确度会以非常混乱的方式表现，从一个时期到另一个时期变化很大。 Do you have some suggest, please?请问您有什么建议吗？

Answer 1

Training accuracy increases and validation accuracy fluctuates are very obvious: the model is trying to learn how to "memorize" the training set, so we have validation set to prevent it from overfitting.训练准确率提高和验证准确率波动非常明显：model 正在尝试学习如何“记忆”训练集，因此我们有验证集以防止它过度拟合。

Also seeing from the result, your model seems to learn so low.从结果中也可以看出，您的 model 似乎学得这么低。 Try tuning the hyperparameters.尝试调整超参数。

A one thing that I notice (but cannot confirm): if you use transfer learning and the learning rate so big, it may destroy all the hard work of the pretrained model (in here, VGG).我注意到的一件事（但无法确认）：如果您使用迁移学习并且学习率如此之大，它可能会破坏预训练的 model（在这里，VGG）的所有辛勤工作。 I found this learning rate scheduler from a Google's notebook, try using this:我从 Google 的笔记本中找到了这个学习率调度程序，尝试使用它：

start_lr = 0.00001
min_lr = 0.00001
max_lr = 0.00005 * tpu_strategy.num_replicas_in_sync
rampup_epochs = 5
sustain_epochs = 0
exp_decay = .8

def lrfn(epoch):
  if epoch < rampup_epochs:
    return (max_lr - start_lr)/rampup_epochs * epoch + start_lr
  elif epoch < rampup_epochs + sustain_epochs:
    return max_lr
  else:
    return (max_lr - min_lr) * exp_decay**(epoch-rampup_epochs-sustain_epochs) + min_lr
    
lr_callback = tf.keras.callbacks.LearningRateScheduler(lambda epoch: lrfn(epoch), verbose=True)
...
model.fit(..., callbacks=[lr_callback])

The idea is to set a low learning rate at the first epoch, then increase it and slowly decrease it.这个想法是在第一个时期设置一个低学习率，然后增加它并慢慢降低它。

VGG16微调

问题描述

1 个解决方案

解决方案1
0 2020-08-03 12:57:42

VGG16微调

问题描述

1 个解决方案

解决方案1 0 2020-08-03 12:57:42

解决方案1
0 2020-08-03 12:57:42