解释训练损失/准确性与验证损失/准确性

Question

I have a few questions about interpreting the performance of certain optimizers on MNIST using a Lenet5 network and what does the validation loss/accuracy vs training loss/accuracy graphs tell us exactly.关于使用 Lenet5 网络在 MNIST 上解释某些优化器的性能，以及验证损失/准确度与训练损失/准确度图准确地告诉我们什么，我有几个问题。 So everything is done in Keras using a standard LeNet5 network and it is ran for 15 epochs with a batch size of 128.所以一切都是在 Keras 中使用标准 LeNet5 网络完成的，它运行了 15 个 epoch，批量大小为 128。

There are two graphs, train acc vs val acc and train loss vs val loss.有两张图，train acc vs val acc 和 train loss vs val loss。 I made 4 graphs because I ran it twice, once with validation_split = 0.1 and once with validation_data = (x_test, y_test) in model.fit parameters.我制作了 4 个图表，因为我运行了两次，一次使用 validation_split = 0.1 ，一次使用 model.fit 参数中的 validation_data = (x_test, y_test) 。 Specifically the difference is shown here:具体差异如下所示：

train = model.fit(x_train, y_train, epochs=15, batch_size=128, validation_data=(x_test,y_test), verbose=1)
train = model.fit(x_train, y_train, epochs=15, batch_size=128, validation_split=0.1, verbose=1)

These are the graphs I produced:这些是我制作的图表：

using validation_data=(x_test, y_test):

using validation_split=0.1:

So my two questions are:所以我的两个问题是：

1.) How do I interpret both the train acc vs val acc and train loss vs val acc graphs? 1.) 如何解释 train acc vs val acc 和 train loss vs val acc 图？ Like what does it tell me exactly and why do different optimizers have different performances (ie the graphs are different as well).就像它准确地告诉我什么以及为什么不同的优化器具有不同的性能（即图表也不同）。

2.) Why do the graphs change when I use validation_split instead? 2.) 为什么当我使用 validation_split 时图表会发生变化？ Which one would be a better choice to use?使用哪一个会是更好的选择？

Answer 1

I will attempt to provide an answer我将尝试提供答案

You can see that towards the end training accuracy is slightly higher than validation accuracy and training loss is slightly lower than validation loss.您可以看到，接近尾声的训练准确度略高于验证准确度，训练损失略低于验证损失。 This hints at overfitting and if you train for more epochs the gap should widen.这暗示了过度拟合，如果你训练更多的时期，差距应该会扩大。
Even if you use the same model with same optimizer you will notice slight difference between runs because weights are initialized randomly and randomness associated with GPU implementation.即使您使用相同的 model 和相同的优化器，您也会注意到运行之间的细微差异，因为权重是随机初始化的，并且与 GPU 实现相关联的随机性。 You can look here for how to address this issue.您可以在此处查看如何解决此问题。
Different optimizers will usually produce different graph because they update model parameters differently.不同的优化器通常会产生不同的图形，因为它们更新 model 参数的方式不同。 For example, vanilla SGD will do update at constant rate for all parameters and at all training steps.例如，普通 SGD 将以恒定速率更新所有参数和所有训练步骤。 But if you add momentum the rate will depend on previous updates and usually will result in faster convergence.但是，如果您添加动量，则速率将取决于先前的更新，并且通常会导致更快的收敛。 Which means you can achieve same accuracy as vanilla SGD in lower number of iteration.这意味着您可以在较少的迭代次数中达到与普通 SGD 相同的精度。
Graphs will change because training data will be changed if you split randomly.图表会发生变化，因为如果随机拆分，训练数据会发生变化。 But for MNIST you should use standard test split provided with the dataset.但是对于 MNIST，您应该使用数据集提供的标准测试拆分。

解释训练损失/准确性与验证损失/准确性

问题描述

1 个解决方案

解决方案1
2 2020-04-25 06:32:10

解释训练损失/准确性与验证损失/准确性

问题描述

1 个解决方案

解决方案1 2 2020-04-25 06:32:10

解决方案1
2 2020-04-25 06:32:10