简体   繁体   English

在使用多个时期进行训练时如何检查我的模型是否过拟合

[英]How to check if my model is overfitting or not when training with many epochs

I'm training my tensorflow model with 100 epochs.我正在用 100 个 epoch 训练我的tensorflow model

history = model.fit(..., steps_per_epoch=600, ..., epochs=100, ...)

Here is the output when training on 7/100 :这是在7/100上训练时的输出:

Epoch 1/100
600/600 [==============================] - ETA: 0s - loss: 0.1443 - rmse: 0.3799
Epoch 1: val_loss improved from inf to 0.14689, saving model to saved_model/my_model
2022-06-20 20:25:11.552250: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
INFO:tensorflow:Assets written to: saved_model/my_model/assets
600/600 [==============================] - 367s 608ms/step - loss: 0.1443 - rmse: 0.3799 - val_loss: 0.1469 - val_rmse: 0.3833

Epoch 2/100
600/600 [==============================] - ETA: 0s - loss: 0.1470 - rmse: 0.3834
Epoch 2: val_loss did not improve from 0.14689
600/600 [==============================] - 357s 594ms/step - loss: 0.1470 - rmse: 0.3834 - val_loss: 0.1559 - val_rmse: 0.3948

Epoch 3/100
600/600 [==============================] - ETA: 0s - loss: 0.1448 - rmse: 0.3805
Epoch 3: val_loss did not improve from 0.14689
600/600 [==============================] - 341s 569ms/step - loss: 0.1448 - rmse: 0.3805 - val_loss: 0.1634 - val_rmse: 0.4042

Epoch 4/100
600/600 [==============================] - ETA: 0s - loss: 0.1442 - rmse: 0.3798
Epoch 4: val_loss did not improve from 0.14689
600/600 [==============================] - 359s 599ms/step - loss: 0.1442 - rmse: 0.3798 - val_loss: 0.1529 - val_rmse: 0.3910

Epoch 5/100
600/600 [==============================] - ETA: 0s - loss: 0.1461 - rmse: 0.3822
Epoch 5: val_loss did not improve from 0.14689
600/600 [==============================] - 358s 596ms/step - loss: 0.1461 - rmse: 0.3822 - val_loss: 0.1493 - val_rmse: 0.3864

Epoch 6/100
600/600 [==============================] - ETA: 0s - loss: 0.1463 - rmse: 0.3825
Epoch 6: val_loss improved from 0.14689 to 0.14637, saving model to saved_model/my_model
INFO:tensorflow:Assets written to: saved_model/my_model/assets
600/600 [==============================] - 368s 613ms/step - loss: 0.1463 - rmse: 0.3825 - val_loss: 0.1464 - val_rmse: 0.3826

Epoch 7/100
324/600 [===============>..............] - ETA: 2:35 - loss: 0.1434 - rmse: 0.3786

The " Epoch 2/100 " shown that the " val_loss: 0.1559" > "loss: 0.1470 " Epoch 2/100 ”显示“ val_loss: 0.1559" > "loss: 0.1470

Epoch 2/100
600/600 [==============================] - ETA: 0s - loss: 0.1470 - rmse: 0.3834
Epoch 2: val_loss did not improve from 0.14689
600/600 [==============================] - 357s 594ms/step - loss: 0.1470 - rmse: 0.3834 - val_loss: 0.1559 - val_rmse: 0.3948

Base on this StackOverflow link, it says " validation loss > training loss you can call it some overfitting ":基于这个 StackOverflow 链接,它说“ validation loss > training loss you can call it some overfitting ”:

Training Loss and Validation Loss in Deep Learning 深度学习中的训练损失和验证损失

If validation loss >> training loss you can call it overfitting.
If validation loss  > training loss you can call it some overfitting.
If validation loss  < training loss you can call it some underfitting.
If validation loss << training loss you can call it underfitting.

So, is my model overfitting on " Epoch 2/100 "?那么,我的模型在“ Epoch 2/100 ”上是否overfitting If yes, why does "Epoch 6/100" still shows " Epoch 6: val_loss improved from 0.14689 to 0.14637, saving model to saved_model/my_model "?如果是,为什么“Epoch 6/100”仍然显示“ Epoch 6: val_loss improved from 0.14689 to 0.14637, saving model to saved_model/my_model ”?

You can plot the loss of training_data and validation_data that save in the history at the end of training and check when you fall into the overfitting or underfitting .您可以绘制在训练结束时保存在history中的training_datavalidation_data的损失,并检查您何时陷入overfittingunderfitting拟合。 在此处输入图像描述

Code for check and ploting:检查和绘图代码:

import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns
import pandas as pd

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype("float32") / 255
x_test = x_test.reshape(-1, 784).astype("float32") / 255
y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(784,)))
model.add(tf.keras.layers.Dense(32, activation='relu'))
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(10, activation='softmax')) 
model.compile(optimizer = 'adam',
    loss ='categorical_crossentropy',metrics=['accuracy'],)

history = model.fit(x_train, y_train, epochs=20, batch_size=32, validation_split=0.2)
df = pd.DataFrame(history.history).rename_axis('epoch').reset_index().melt(id_vars=['epoch'])
fig, axes = plt.subplots(1,2, figsize=(18,6))
for ax, mtr in zip(axes.flat, ['loss', 'accuracy']):
    ax.set_title(f'{mtr.title()} Plot')
    dfTmp = df[df['variable'].str.contains(mtr)]
    sns.lineplot(data=dfTmp, x='epoch', y='value', hue='variable', ax=ax)
fig.tight_layout()
plt.show()

Output:输出:

在我看来,“过度拟合”的唯一信号是您的验证损失开始增加,而训练损失仍在减少,但不是“验证损失 >> 训练损失”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 lstm 训练的 model 需要多少个 epoch - how many epochs required for model with lstm training 在 tensorflow 中训练模型时的 Epochs 问题 - Epochs problem when training model in tensorflow Keras:如果我在训练几个 epoch 后重新编译我的 model 会怎样 - Keras: What if i recompile my model after training few epochs 我是否应该继续训练我的 model 以获得更好的 R 平方值? - Should I keep training my model for more Epochs to get a better R Squared value? 如何使用 Tensorboard 检查训练模型的准确性? - How do I check accuracy of my training model using Tensorboard? 为什么当我使用更多的 epoch 来训练我的 CNN 时,我的测试准确率会下降 - Why my test accuracy falls when i use more epochs for training my CNN 我怎么知道我的神经网络模型是否过度拟合(Keras) - How do I know if my Neural Network model is overfitting or not (Keras) 交叉验证时,scikit 学习使用多少个 epoch? - How many epochs does scikit learn use when cross validating? 拟合模型时,batch size 和 epoch 的数量应该有多大? - How big should batch size and number of epochs be when fitting a model? 经过多次训练后,TF 2.x 中的 CuDNN 崩溃 - CuDNN crash in TF 2.x after many epochs of training
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM