我的模型的验证精度卡住了！我如何解决它？ [张量流/Keras]

Question

I am trying to train a Regression model for a series of images on my Local PC running on an Nvidia RTX 2070 Super.我正在尝试为运行在 Nvidia RTX 2070 Super 上的本地 PC 上的一系列图像训练回归模型。

Input image Shape: (96,160,3), 16072 Samples (12857 for training, 3215 for validation)输入图像形状：(96,160,3)，16072 个样本（12857 个用于训练，3215 个用于验证）

However, the training is not taking place.但是，培训并未进行。 I tried using a custom model, as mentioned in this reference .我尝试使用自定义模型，如本参考中所述。

model = Sequential()
model.add(Conv2D(input_shape=(96, 320, 3), filters=32, kernel_size=3, padding="valid"))
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Activation('relu'))

model.add(Conv2D(filters=256, kernel_size=3, padding="valid"))
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Activation('relu'))

model.add(Conv2D(filters=512, kernel_size=3, padding="valid"))
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Activation('relu'))

model.add(Flatten())
model.add(Dense(2048))
model.add(Activation('relu'))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(1))

checkpoint = ModelCheckpoint(filepath="./ckpts/model.ckpt", monitor='val_loss', save_best_only=True)
stopper = EarlyStopping(monitor='val_acc', min_delta=0.0003, patience = 10)

lr_schedule = ExponentialDecay(initial_learning_rate=0.1, decay_steps=10000, decay_rate=0.9)
optimizer = Adam(learning_rate=lr_schedule)
loss = Huber(delta=0.5, reduction="auto", name="huber_loss")
model.compile(loss = loss, optimizer = optimizer, metrics=['accuracy'])
model.fit(X_train, y_train, validation_split = 0.2, shuffle = True, epochs = 100, 
          callbacks=[checkpoint, stopper])

The training went on for 10 epochs before automatically terminating after exhausting patience of Early Stopping Algorithm.训练持续了 10 个 epoch，然后在耗尽早停算法的耐心后自动终止。 Essentially, the model did not train.本质上，该模型没有经过训练。 The validation accuracy was stuck at 0.5353.验证准确率停留在 0.5353。

Then, suspecting something may be wrong with my model itself, I tried Transfer Learning, importing pre-trained Inception V3 with Imagenet weights, and tried training with frozen weights (only training the last two layers with the pre-trained weights frozen).然后，怀疑我的模型本身可能有问题，我尝试了迁移学习，使用 Imagenet 权重导入预训练的Inception V3 ，并尝试使用冻结权重进行训练（仅使用冻结的预训练权重训练最后两层）。

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Input, Lambda, Dense, Flatten, Conv2D, Activation, MaxPooling2D, 
      Dropout, GlobalAveragePooling2D
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import Huber
from tensorflow.keras.optimizers.schedules import ExponentialDecay

X_train = preprocess_input(X_train)
inception = InceptionV3(weights='imagenet', include_top=False, input_shape=(299,299,3))
inception.trainable = False

driving_input = Input(shape=(96,320,3))
resized_input = Lambda(lambda image: tf.image.resize(image,(299,299)))(driving_input)
inp = inception(resized_input)

x = GlobalAveragePooling2D()(inp)
x = Dense(512, activation = 'relu')(x)
result = Dense(1, activation = 'relu')(x)


model = Model(inputs = driving_input, outputs = result)
model.compile(optimizer='Adam', loss='mse', metrics = ['accuracy'])

checkpoint = ModelCheckpoint(filepath="./ckpts/model.ckpt", monitor='val_loss', save_best_only=True)
stopper = EarlyStopping(monitor='val_acc', min_delta=0.0003, patience = 10)

batch_size = 32
epochs = 100

model.fit(x=X_train, y=y_train, shuffle=True, validation_split=0.2, epochs=epochs, 
          batch_size=batch_size, verbose=1, callbacks=[checkpoint, stopper])

The result I got did not solve my issue.我得到的结果并没有解决我的问题。

I finally also tried unfreezing the weights and retraining the entire InceptionV3 Network, initialized with the Imagenet pre-trained weights.最后，我还尝试解冻权重并重新训练整个 InceptionV3 网络，使用 Imagenet 预训练权重进行初始化。

inception.trainable = True

This also did not produce any change in my model's performance.这也没有对我的模型的性能产生任何变化。 The validation accuracy is still very much stuck at the same place.验证准确性仍然非常停留在同一个地方。

I don't understand why my model is not training.我不明白为什么我的模型没有训练。 Please help me solve this issue.请帮我解决这个问题。 Do note that I do not expect help with improving my model in general, I can do that myself, but what I can't do is getting the model unstuck.请注意，我通常不会期望帮助改进我的模型，我可以自己做，但我不能做的是让模型解开。 Unless that happens, I can't proceed with any fine tuning or optimization.除非发生这种情况，否则我无法进行任何微调或优化。 Any help would be very appreciated.任何帮助将不胜感激。

Answer 1

Thanks to Dr. Snoopy for pointing out the fundamental flaw that accuracy is a classification metric.感谢 Snoopy 博士指出了准确性是一个分类指标的基本缺陷。 I switched to Validation Loss using the following modified code:我使用以下修改后的代码切换到验证损失：

checkpoint = ModelCheckpoint(filepath="./ckpts/model.h5", monitor='val_loss', 
                             save_best_only=True)
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0003, patience = 10)

lr_schedule = ExponentialDecay(initial_learning_rate=0.1, decay_steps=10000, 
                                decay_rate=0.9)
optimizer = Adam(learning_rate=lr_schedule)
loss = Huber(delta=0.5, reduction="auto", name="huber_loss")
model.compile(loss = loss, optimizer = optimizer)
model.fit(X_train, y_train, validation_split = 0.2, shuffle = True, epochs = 100, 
          callbacks=[checkpoint, stopper])

model.load_weights('./ckpts/model.h5')
model.save('model.h5')

The result was an unstuck model:结果是一个未卡住的模型：

我的模型的验证精度卡住了！我如何解决它？ [张量流/Keras]

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-11-23 03:38:23

我的模型的验证精度卡住了！ 我如何解决它？ [张量流/Keras]

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-11-23 03:38:23

我的模型的验证精度卡住了！我如何解决它？ [张量流/Keras]

解决方案1
0 已采纳 2020-11-23 03:38:23