我的 keras 神经网络 model 的精度和损失不稳定

Question

I've build a NN model for a binary classification problem with the help of keras, here's the code:在 keras 的帮助下，我为二进制分类问题构建了一个 NN model，代码如下：

# create a new model
nn_model = models.Sequential()

# add input and dense layer
nn_model.add(layers.Dense(128, activation='relu', input_shape=(22,))) # 128 is the number of the hidden units and 22 is the number of features
nn_model.add(layers.Dense(16, activation='relu'))
nn_model.add(layers.Dense(16, activation='relu'))

# add a final layer
nn_model.add(layers.Dense(1, activation='sigmoid'))

# I have 3000 rows split from the training set to monitor the accuracy and loss
# compile and train the model
nn_model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])

history = nn_model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512, # The batch size defines the number of samples that will be propagated through the network.
                    validation_data=(x_val, y_val))

Here's the training log:这是训练日志：

Train on 42663 samples, validate on 3000 samples
Epoch 1/20
42663/42663 [==============================] - 0s 9us/step - loss: 0.2626 - acc: 0.8960 - val_loss: 0.2913 - val_acc: 0.8767
Epoch 2/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2569 - acc: 0.8976 - val_loss: 0.2625 - val_acc: 0.9007
Epoch 3/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2560 - acc: 0.8958 - val_loss: 0.2546 - val_acc: 0.8900
Epoch 4/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2538 - acc: 0.8970 - val_loss: 0.2451 - val_acc: 0.9043
Epoch 5/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2526 - acc: 0.8987 - val_loss: 0.2441 - val_acc: 0.9023
Epoch 6/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2507 - acc: 0.8997 - val_loss: 0.2825 - val_acc: 0.8820
Epoch 7/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2504 - acc: 0.8993 - val_loss: 0.2837 - val_acc: 0.8847
Epoch 8/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2507 - acc: 0.8988 - val_loss: 0.2631 - val_acc: 0.8873
Epoch 9/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2471 - acc: 0.9012 - val_loss: 0.2788 - val_acc: 0.8823
Epoch 10/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2489 - acc: 0.8997 - val_loss: 0.2414 - val_acc: 0.9010
Epoch 11/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2471 - acc: 0.9017 - val_loss: 0.2741 - val_acc: 0.8880
Epoch 12/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2458 - acc: 0.9016 - val_loss: 0.2523 - val_acc: 0.8973
Epoch 13/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2433 - acc: 0.9022 - val_loss: 0.2571 - val_acc: 0.8940
Epoch 14/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2457 - acc: 0.9012 - val_loss: 0.2567 - val_acc: 0.8973
Epoch 15/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2421 - acc: 0.9020 - val_loss: 0.2411 - val_acc: 0.8957
Epoch 16/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2434 - acc: 0.9007 - val_loss: 0.2431 - val_acc: 0.9000
Epoch 17/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2431 - acc: 0.9021 - val_loss: 0.2398 - val_acc: 0.9000
Epoch 18/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2435 - acc: 0.9018 - val_loss: 0.2919 - val_acc: 0.8787
Epoch 19/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2409 - acc: 0.9029 - val_loss: 0.2478 - val_acc: 0.8943
Epoch 20/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2426 - acc: 0.9020 - val_loss: 0.2380 - val_acc: 0.9007

I plotted the accuracy and loss for both training and validation set:我绘制了训练集和验证集的准确率和损失：

As we can see, the result is not very stable, and I selected two epoches to retrain all of the training set, here's the new log:我们可以看到，结果不是很稳定，我选择了两个 epoch 来重新训练所有的训练集，这是新的日志：

Epoch 1/2
45663/45663 [==============================] - 0s 7us/step - loss: 0.5759 - accuracy: 0.7004
Epoch 2/2
45663/45663 [==============================] - 0s 5us/step - loss: 0.5155 - accuracy: 0.7341

My question is why the accuracy is so unstable, and it's only 73% for the retrained model,how can I improve the model?我的问题是为什么准确率如此不稳定，重新训练的 model 只有 73%，我该如何改进 model？ Thanks.谢谢。

Answer 1

Your validation size is 3000 and your train size is 42663 which means your validation size is around 7%.您的验证大小为 3000，您的火车大小为 42663，这意味着您的验证大小约为 7%。 Your validation accuracy is jumping between.88 to.90 which is -+2% jump.您的验证准确度在.88 到.90 之间跳跃，即 -+2% 跳跃。 7% validation data is too small to get good statistics and with just 7% data, -+2% jump is not bad. 7% 的验证数据太小，无法获得好的统计数据，只有 7% 的数据，-+2% 的跳跃也不错。 Normally the validation data should be 20% to 25% of total data ie 75-25 split of train-val.通常验证数据应占总数据的 20% 到 25%，即 train-val 的 75-25 拆分。

Also make sure you shuffle the data before making train-val split.还要确保在进行 train-val 拆分之前对数据进行洗牌。

if X and y is your full datasets then use如果X和y是您的完整数据集，则使用

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

which shuffles the data and give you 75-25 split.这会打乱数据并给您 75-25 拆分。

Answer 2

I don't think it's unstable at all for the validation loss to oscillate between 88% and 90%.我认为验证损失在 88% 和 90% 之间波动根本不是不稳定的。 If you put it on the 0-100 scale, this "unstability" looks absolutely tiny.如果你把它放在 0-100 的范围内，这种“不稳定”看起来绝对是微不足道的。

import numpy as np
import matplotlib.pyplot as plt

plt.plot(np.arange(20), np.random.randint(88, 90, 20))
plt.title('Random Values Between 88 and 90')
plt.ylim(0, 100)
plt.show()

Answer 3

Its hard to tell without knowing the dataset.在不知道数据集的情况下很难分辨。 Currently you only use Dense layers, depending on your problem, Rnns or convolutional layers might suit better for the case.目前您只使用密集层，根据您的问题，RNns 或卷积层可能更适合这种情况。 Also what I can see is, you use a pretty high batch size of 512. There are alot of opinions about how the batch size should be.我还可以看到，您使用了 512 的相当高的批量大小。关于批量大小应该如何，有很多意见。 I can say from my experience, that a batch size of more than 128 might cause bad convegence, but this is depended on many things.根据我的经验，我可以说，超过 128 的批量大小可能会导致收敛性差，但这取决于很多事情。

Also you might add some normalization to your net by using Dropout layers.您也可以使用 Dropout 层为您的网络添加一些规范化。

And another point, you might want to pass shuffle=True to model.fit(), else the model will always see the same data in the same order, which can lower its ability to generalize.还有一点，您可能希望将shuffle=True传递给 model.fit()，否则 model 将始终以相同的顺序看到相同的数据，这会降低其泛化能力。

Implementing these changes might fix the "bouncing loss", where I think shuffling is the most important one.实施这些更改可能会解决“反弹损失”，我认为洗牌是最重要的。

我的 keras 神经网络 model 的精度和损失不稳定

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-08-10 14:41:56

解决方案2
1 2020-08-10 14:45:38

解决方案3
1 2020-08-10 14:48:19

我的 keras 神经网络 model 的精度和损失不稳定

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-08-10 14:41:56

解决方案2 1 2020-08-10 14:45:38

解决方案3 1 2020-08-10 14:48:19

解决方案1
3 已采纳 2020-08-10 14:41:56

解决方案2
1 2020-08-10 14:45:38

解决方案3
1 2020-08-10 14:48:19