[英]My model's Validation Accuracy is stuck! How do I fix it? [Tensorflow/Keras]
I am trying to train a Regression model for a series of images on my Local PC running on an Nvidia RTX 2070 Super.我正在尝试为运行在 Nvidia RTX 2070 Super 上的本地 PC 上的一系列图像训练回归模型。
Input image Shape: (96,160,3), 16072 Samples (12857 for training, 3215 for validation)输入图像形状:(96,160,3),16072 个样本(12857 个用于训练,3215 个用于验证)
However, the training is not taking place.但是,培训并未进行。 I tried using a custom model, as mentioned in this reference .我尝试使用自定义模型,如本参考中所述。
model = Sequential()
model.add(Conv2D(input_shape=(96, 320, 3), filters=32, kernel_size=3, padding="valid"))
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Activation('relu'))
model.add(Conv2D(filters=256, kernel_size=3, padding="valid"))
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Activation('relu'))
model.add(Conv2D(filters=512, kernel_size=3, padding="valid"))
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(2048))
model.add(Activation('relu'))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(1))
checkpoint = ModelCheckpoint(filepath="./ckpts/model.ckpt", monitor='val_loss', save_best_only=True)
stopper = EarlyStopping(monitor='val_acc', min_delta=0.0003, patience = 10)
lr_schedule = ExponentialDecay(initial_learning_rate=0.1, decay_steps=10000, decay_rate=0.9)
optimizer = Adam(learning_rate=lr_schedule)
loss = Huber(delta=0.5, reduction="auto", name="huber_loss")
model.compile(loss = loss, optimizer = optimizer, metrics=['accuracy'])
model.fit(X_train, y_train, validation_split = 0.2, shuffle = True, epochs = 100,
callbacks=[checkpoint, stopper])
The training went on for 10 epochs before automatically terminating after exhausting patience of Early Stopping Algorithm.训练持续了 10 个 epoch,然后在耗尽早停算法的耐心后自动终止。 Essentially, the model did not train.本质上,该模型没有经过训练。 The validation accuracy was stuck at 0.5353.验证准确率停留在 0.5353。
Then, suspecting something may be wrong with my model itself, I tried Transfer Learning, importing pre-trained Inception V3 with Imagenet weights, and tried training with frozen weights (only training the last two layers with the pre-trained weights frozen).然后,怀疑我的模型本身可能有问题,我尝试了迁移学习,使用 Imagenet 权重导入预训练的Inception V3 ,并尝试使用冻结权重进行训练(仅使用冻结的预训练权重训练最后两层)。
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Input, Lambda, Dense, Flatten, Conv2D, Activation, MaxPooling2D,
Dropout, GlobalAveragePooling2D
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import Huber
from tensorflow.keras.optimizers.schedules import ExponentialDecay
X_train = preprocess_input(X_train)
inception = InceptionV3(weights='imagenet', include_top=False, input_shape=(299,299,3))
inception.trainable = False
driving_input = Input(shape=(96,320,3))
resized_input = Lambda(lambda image: tf.image.resize(image,(299,299)))(driving_input)
inp = inception(resized_input)
x = GlobalAveragePooling2D()(inp)
x = Dense(512, activation = 'relu')(x)
result = Dense(1, activation = 'relu')(x)
model = Model(inputs = driving_input, outputs = result)
model.compile(optimizer='Adam', loss='mse', metrics = ['accuracy'])
checkpoint = ModelCheckpoint(filepath="./ckpts/model.ckpt", monitor='val_loss', save_best_only=True)
stopper = EarlyStopping(monitor='val_acc', min_delta=0.0003, patience = 10)
batch_size = 32
epochs = 100
model.fit(x=X_train, y=y_train, shuffle=True, validation_split=0.2, epochs=epochs,
batch_size=batch_size, verbose=1, callbacks=[checkpoint, stopper])
The result I got did not solve my issue.我得到的结果并没有解决我的问题。
I finally also tried unfreezing the weights and retraining the entire InceptionV3 Network, initialized with the Imagenet pre-trained weights.最后,我还尝试解冻权重并重新训练整个 InceptionV3 网络,使用 Imagenet 预训练权重进行初始化。
inception.trainable = True
This also did not produce any change in my model's performance.这也没有对我的模型的性能产生任何变化。 The validation accuracy is still very much stuck at the same place.验证准确性仍然非常停留在同一个地方。
I don't understand why my model is not training.我不明白为什么我的模型没有训练。 Please help me solve this issue.请帮我解决这个问题。 Do note that I do not expect help with improving my model in general, I can do that myself, but what I can't do is getting the model unstuck.请注意,我通常不会期望帮助改进我的模型,我可以自己做,但我不能做的是让模型解开。 Unless that happens, I can't proceed with any fine tuning or optimization.除非发生这种情况,否则我无法进行任何微调或优化。 Any help would be very appreciated.任何帮助将不胜感激。
Thanks to Dr. Snoopy for pointing out the fundamental flaw that accuracy is a classification metric.感谢 Snoopy 博士指出了准确性是一个分类指标的基本缺陷。 I switched to Validation Loss using the following modified code:我使用以下修改后的代码切换到验证损失:
checkpoint = ModelCheckpoint(filepath="./ckpts/model.h5", monitor='val_loss',
save_best_only=True)
stopper = EarlyStopping(monitor='val_loss', min_delta=0.0003, patience = 10)
lr_schedule = ExponentialDecay(initial_learning_rate=0.1, decay_steps=10000,
decay_rate=0.9)
optimizer = Adam(learning_rate=lr_schedule)
loss = Huber(delta=0.5, reduction="auto", name="huber_loss")
model.compile(loss = loss, optimizer = optimizer)
model.fit(X_train, y_train, validation_split = 0.2, shuffle = True, epochs = 100,
callbacks=[checkpoint, stopper])
model.load_weights('./ckpts/model.h5')
model.save('model.h5')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.