训练损失高于验证损失

Question

I am trying to train a regression model of a dummy function with 3 variables with fully connected neural nets in Keras and I always get a training loss much higher than the validation loss.我正在尝试用 Keras 中的完全连接的神经网络训练具有 3 个变量的虚拟函数的回归模型，并且我总是得到比验证损失高得多的训练损失。

I split the data set in 2/3 for training and 1/3 for validation.我将数据集分成 2/3 用于训练，1/3 用于验证。 I have tried lots of different things:我尝试了很多不同的东西：

changing the architecture改变架构
adding more neurons添加更多神经元
using regularization使用正则化
using different batch sizes使用不同的批量大小

Still the training error is one order of magnitude higer than the validation error:训练误差仍然比验证误差高一个数量级：

Epoch 5995/6000
4020/4020 [==============================] - 0s 78us/step - loss: 1.2446e-04 - mean_squared_error: 1.2446e-04 - val_loss: 1.3953e-05 - val_mean_squared_error: 1.3953e-05
Epoch 5996/6000
4020/4020 [==============================] - 0s 98us/step - loss: 1.2549e-04 - mean_squared_error: 1.2549e-04 - val_loss: 1.5730e-05 - val_mean_squared_error: 1.5730e-05
Epoch 5997/6000
4020/4020 [==============================] - 0s 105us/step - loss: 1.2500e-04 - mean_squared_error: 1.2500e-04 - val_loss: 1.4372e-05 - val_mean_squared_error: 1.4372e-05
Epoch 5998/6000
4020/4020 [==============================] - 0s 96us/step - loss: 1.2500e-04 - mean_squared_error: 1.2500e-04 - val_loss: 1.4151e-05 - val_mean_squared_error: 1.4151e-05
Epoch 5999/6000
4020/4020 [==============================] - 0s 80us/step - loss: 1.2487e-04 - mean_squared_error: 1.2487e-04 - val_loss: 1.4342e-05 - val_mean_squared_error: 1.4342e-05
Epoch 6000/6000
4020/4020 [==============================] - 0s 79us/step - loss: 1.2494e-04 - mean_squared_error: 1.2494e-04 - val_loss: 1.4769e-05 - val_mean_squared_error: 1.4769e-05

This makes no sense, please help!这是没有意义的，请帮助！

Edit: this is the full code编辑：这是完整的代码

I have 6000 training examples我有 6000 个训练示例

# -*- coding: utf-8 -*-
"""
Created on Mon Feb 26 13:40:03 2018

@author: Michele
"""
#from keras.datasets import reuters
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM
from keras import optimizers
import matplotlib.pyplot as plt
import os 
import pylab 
from keras.constraints import maxnorm
from sklearn.model_selection import train_test_split
from keras import regularizers
from sklearn.preprocessing import MinMaxScaler
import math
from sklearn.metrics import mean_squared_error
import keras

# fix random seed for reproducibility
seed=7
np.random.seed(seed)

dataset = np.loadtxt("BabbaX.csv", delimiter=",")
 #split into input (X) and output (Y) variables
#x = dataset.transpose()[:,10:15] #only use power
x = dataset
del(dataset) # delete container
dataset = np.loadtxt("BabbaY.csv", delimiter=",")
 #split into input (X) and output (Y) variables
y = dataset.transpose()
del(dataset) # delete container

 #scale labels from 0 to 1
scaler = MinMaxScaler(feature_range=(0, 1))
y = np.reshape(y, (y.shape[0],1))
y = scaler.fit_transform(y)

lenData=x.shape[0]
x=np.transpose(x)

xtrain=x[:,0:round(lenData*0.67)]
ytrain=y[0:round(lenData*0.67),]
xtest=x[:,round(lenData*0.67):round(lenData*1.0)]
ytest=y[round(lenData*0.67):round(lenData*1.0)]

xtrain=np.transpose(xtrain)
xtest=np.transpose(xtest)    

l2_lambda = 0.1 #reg factor

#sequential type of model
model = Sequential() 
#stacking layers with .add
units=300
#model.add(Dense(units, input_dim=xtest.shape[1], activation='relu', kernel_initializer='normal', kernel_regularizer=regularizers.l2(l2_lambda), kernel_constraint=maxnorm(3)))
model.add(Dense(units, activation='relu', input_dim=xtest.shape[1]))
#model.add(Dropout(0.1))
model.add(Dense(units, activation='relu'))
#model.add(Dropout(0.1))
model.add(Dense(1)) #no activation function should be used for the output layer

rms = optimizers.RMSprop(lr=0.00001, rho=0.9, epsilon=None, decay=0) #It is recommended to leave the parameters
adam = keras.optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=1e-6, amsgrad=False)

#of this optimizer at their default values (except the learning rate, which can be freely tuned).
#adam=keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)

#configure learning process with .compile
model.compile(optimizer=adam, loss='mean_squared_error', metrics=['mse'])

# fit the model (iterate on the training data in batches)
history = model.fit(xtrain, ytrain, nb_epoch=1000, batch_size=round(xtest.shape[0]/100),
              validation_data=(xtest, ytest), shuffle=True, verbose=2)

#extract weights for each layer
weights = [layer.get_weights() for layer in model.layers]

#evaluate on training data set
valuesTrain=model.predict(xtrain)

#evaluate on test data set
valuesTest=model.predict(xtest)

 #invert predictions
valuesTrain = scaler.inverse_transform(valuesTrain)
ytrain = scaler.inverse_transform(ytrain)
valuesTest = scaler.inverse_transform(valuesTest)
ytest = scaler.inverse_transform(ytest)

Answer 1

TL;DR : When a model is learning well and quickly the validation loss can be lower than the training loss, since the validation happens on the updated model, while the training loss did not have any (no batches) or only some (with batches) of the updates applied. TL; DR ：当模型学习良好且快速时，验证损失可以低于训练损失，因为验证发生在更新的模型上，而训练损失没有任何（无批次）或只有一些（有批次））的更新。

Okay I think I found out what's happening here. 好吧，我想我知道这里发生了什么。 I used the following code to test this. 我使用以下代码对此进行了测试。

import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt

np.random.seed(7)

N_DATA = 6000

x = np.random.uniform(-10, 10, (3, N_DATA))
y = x[0] + x[1]**2 + x[2]**3

xtrain = x[:, 0:round(N_DATA*0.67)]
ytrain = y[0:round(N_DATA*0.67)]

xtest = x[:, round(N_DATA*0.67):N_DATA]
ytest = y[round(N_DATA*0.67):N_DATA]

xtrain = np.transpose(xtrain)
xtest = np.transpose(xtest)

model = Sequential()
model.add(Dense(10, activation='relu', input_dim=3))
model.add(Dense(5, activation='relu'))
model.add(Dense(1))

adam = keras.optimizers.Adam()

# configure learning process with .compile
model.compile(optimizer=adam, loss='mean_squared_error', metrics=['mse'])

# fit the model (iterate on the training data in batches)
history = model.fit(xtrain, ytrain, nb_epoch=50,
                    batch_size=round(N_DATA/100),
                    validation_data=(xtest, ytest), shuffle=False, verbose=2)

plt.plot(history.history['mean_squared_error'])
plt.plot(history.history['val_loss'])
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

This is essentially the same as your code and replicates the problem, which is not actually a problem. 这本质上与您的代码相同，并且复制了问题，而这实际上不是问题。 Simply change 只需更改

history = model.fit(xtrain, ytrain, nb_epoch=50,
                    batch_size=round(N_DATA/100),
                    validation_data=(xtest, ytest), shuffle=False, verbose=2)

to 至

history = model.fit(xtrain, ytrain, nb_epoch=50,
                    batch_size=round(N_DATA/100),
                    validation_data=(xtrain, ytrain), shuffle=False, verbose=2)

So instead of validating with your validation data you validate using the training data again, which leads to exactly the same behavior. 因此，您无需再次使用验证数据进行验证，而可以再次使用训练数据进行验证，这将导致完全相同的行为。 Weird isn't it? 是不是很奇怪 No actually not. 不，实际上不是。 What I think is happening is: 我认为正在发生的是：

The initial mean_squared_error given by Keras on every epoch is the loss before the gradients have been applied, while the validation happens after the gradients have been applied, which makes sense. mean_squared_error在每个时期给出的初始mean_squared_error是在应用渐变之前的损失，而验证发生在应用渐变之后，这是有道理的。

With highly stochastic problems for which NNs are usually used you do not see that, because the data varies so much that the updated weights simply are not good enough to describe the validation data, the slight overfitting effect on the training data is still so much stronger that even after updating the weights the validation loss is still higher than the training loss from before. 对于通常使用NN的高度随机性问题，您不会看到，因为数据变化太大，以至于更新后的权重根本不足以描述验证数据，所以对训练数据的轻微过度拟合效果仍然强得多即使在更新权重之后，验证损失仍然高于以前的训练损失。 That is only how I think it is though, I might be completely wrong. 那只是我的想法，我可能完全错了。

Answer 2

One of the reasons that I think is maybe you can increase the size of training data and lower the size of validation data.我认为的原因之一是也许您可以增加训练数据的大小并降低验证数据的大小。 Then your model will be trained on more samples which may include some complex samples as well and then can be validated on the remaining samples.然后您的模型将在更多样本上进行训练，其中可能还包括一些复杂的样本，然后可以在剩余的样本上进行验证。 Try something like train-80% and Validation-20% or any other numbers a little higher than what you used previously.尝试使用 train-80% 和 Validation-20% 或任何其他比您之前使用的数字稍高的数字。

If you don't want to change the size of training and validation sets, then you can try changing the random seed value to some other number so that you will get a training set with different samples which might be helpful in training the model well.如果您不想更改训练集和验证集的大小，那么您可以尝试将随机种子值更改为其他数字，以便获得包含不同样本的训练集，这可能有助于很好地训练模型。

Check this answer here to get more understanding of the other possible reasons.在此处查看此答案以进一步了解其他可能的原因。

Check this link if you want a more detailed explanation with an example.如果您想通过示例进行更详细的说明，请查看此链接。 @Michele @米歇尔

Answer 3

If training loss is a little higher or nearer to validation loss, it mean that model is not overfitting. 如果训练损失略高于或接近验证损失，则意味着该模型不是过拟合的。 Efforts are always there to use best out of features to have less overfitting and better validation and test accuracies. 总是会尽力使用最佳功能，以减少过度拟合的情况，并提高验证和测试的准确性。 Probable reason that you are always getting train loss higher can be the features and data you are using to train. 您始终使火车损失越来越高的可能原因可能是您正在训练的功能和数据。

Please refer following link and observe the training and validation loss in case of dropout: http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/ 请参阅以下链接，并观察辍学情况下的培训和验证损失： http : //danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/

训练损失高于验证损失

问题描述

3 个解决方案

解决方案1
2 2018-05-17 12:57:57

解决方案2
0 2021-10-19 02:37:11

解决方案3
-2 2018-05-17 10:56:10

训练损失高于验证损失

问题描述

3 个解决方案

解决方案1 2 2018-05-17 12:57:57

解决方案2 0 2021-10-19 02:37:11

解决方案3 -2 2018-05-17 10:56:10

解决方案1
2 2018-05-17 12:57:57

解决方案2
0 2021-10-19 02:37:11

解决方案3
-2 2018-05-17 10:56:10