如何同时监控损失和 val_loss 以避免神经网络过度拟合训练集或测试集？

Question

I've been joining this hackathon and playing with keras callbacks and neural network, may I know if there is a way to monitor not only loss or val_loss but BOTH of them to avoid overfitting either the test or train set?我一直在参加这个黑客马拉松并玩 keras 回调和神经网络，我是否知道是否有一种方法不仅可以监控 loss 或 val_loss ，而且可以同时监控它们以避免过度拟合测试或训练集？ eg: can i put a function for the monitor field instead of just one field name?例如：我可以为监控字段放置一个 function 而不是一个字段名称吗？

If I want to monitor val_loss to pick the lowest but I also want a second criteria to pick the minimum difference between val_loss and loss.如果我想监控 val_loss 以选择最低值，但我还想要第二个标准来选择 val_loss 和 loss 之间的最小差异。

Answer 1

You can choose between two approaches:您可以在两种方法之间进行选择：

Create a custom metric to record the metric you want, by subclassing tf.keras.metrics.Metric.通过子类化 tf.keras.metrics.Metric，创建自定义指标来记录您想要的指标。 See https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Metric for an example.有关示例，请参见https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Metric 。
You can then use your metric in standard callbacks eg EarlyStopping()然后，您可以在标准回调中使用您的指标，例如 EarlyStopping()
Create a custom callback to do the calculation (and take the action) you want, by subclassing tf.keras.callbacks.CallBack.通过子类化 tf.keras.callbacks.CallBack，创建自定义回调来执行所需的计算（并采取行动）。 See https://www.tensorflow.org/guide/keras/custom_callback for how to do this.请参阅https://www.tensorflow.org/guide/keras/custom_callback了解如何执行此操作。

Answer 2

I have an answer to a problem that is pretty similar to this, here .我有一个与此非常相似的问题的答案， here 。

Basically, it is not possible to monitor multiple metrics with keras callbacks.基本上，不可能使用 keras 回调来监控多个指标。 However you could define a custom callback (see the documentation for more info) that can access the logs at each epoch and do some operations.但是，您可以定义一个自定义回调（有关更多信息，请参阅文档），它可以在每个 epoch 访问日志并执行一些操作。

Let's say if you want to monitor loss and val_loss you can do something like this:假设你想监控loss和val_loss你可以这样做：

class CombineCallback(tf.keras.callbacks.Callback):

    def __init__(self, **kargs):
        super(CombineCallback, self).__init__(**kargs)

    def on_epoch_end(self, epoch, logs={}):
        logs['combine_metric'] = logs['val_loss'] + logs['loss']

Side note : the most important thing in my opinion is to monitor the validation loss.旁注：我认为最重要的是监控验证损失。 Train loss of course will keep dropping, so it is not really that meaningful to observe.火车损失当然会不断下降，因此观察并没有那么有意义。 If you really want to monitor them both I suggest you adding a multiplicative factor and give more weight to validation loss.如果你真的想同时监控它们，我建议你添加一个乘法因子，并给验证损失更多的权重。 In this case:在这种情况下：

class CombineCallback(tf.keras.callbacks.Callback):

    def __init__(self, **kargs):
        super(CombineCallback, self).__init__(**kargs)

    def on_epoch_end(self, epoch, logs={}):
        factor = 0.8
        logs['combine_metric'] = factor * logs['val_loss'] + (1-factor) * logs['loss']

Then you can use it like this:然后你可以像这样使用它：

model.fit(
    ...
    callbacks=[CombineCallback()],
)

Also you can draw more inspiration here .你也可以在这里汲取更多灵感。

Answer 3

Below is a Keras custom callback that should do the job.下面是一个 Keras 自定义回调应该可以完成这项工作。 The callback monitors both the training loss and the validation loss.回调监控训练损失和验证损失。 The form of the callback is callbacks=[SOMT(model, train_thold, valid_thold)] where:回调的形式是 callbacks=[SOMT(model, train_thold, valid_thold)] 其中：

model is the name of your complied model model 是您编译的 model 的名称
train_thold is a float. train_thold 是一个浮点数。 It is the value of accuracy (in Percent) that must be achieved by the model in order to conditionally stop training这是 model 必须达到的准确度值（以百分比为单位），以便有条件地停止训练
valid_threshold is a float. valid_threshold 是一个浮点数。 It is the value of validation accuracy (in Percent) that must be achieved by the model in order to conditionally stop training Note to stop training BOTH the train_thold and valid_thold must be exceeded in the SAME epoch.这是 model 必须达到的验证准确度值（以百分比为单位），以便有条件地停止训练注意要停止训练，必须在同一时期超过 train_thold 和 valid_thold。
If you want to stop training based soley on the training accuracy set the valid_thold to 0.0.如果您想停止基于训练精度的单独训练，请将 valid_thold 设置为 0.0。
Similarly if you want to stop training on just validation accuracy set train_thold= 0.0.同样，如果您想停止仅验证准确度集 train_thold= 0.0 的训练。

Note if both thresholds are not achieved in the same epoch training will continue until the value of epochs specified in model.fit is reached.请注意，如果在同一时期没有达到两个阈值，训练将继续进行，直到达到 model.fit 中指定的时期值。
For example lets take the case that you want to stop training when the例如，假设您想在以下情况下停止训练
training accuracy has reached or exceeded 95 % and the validation accuracy has achieved at least 85%训练准确率达到或超过 95%，验证准确率至少达到 85%
then the code would be callbacks=[SOMT(my_model, .95, .85)]那么代码将是 callbacks=[SOMT(my_model, .95, .85)]

class SOMT(keras.callbacks.Callback):
    def __init__(self, model,  train_thold, valid_thold):
        super(SOMT, self).__init__()
        self.model=model        
        self.train_thold=train_thold
        self.valid_thold=valid_thold
        
    def on_train_begin(self, logs=None):
        print('Starting Training - training will halt if training accuracy achieves or exceeds ', self.train_thold)
        print ('and validation accuracy meets or exceeds ', self.valid_thold) 
        msg='{0:^8s}{1:^12s}{2:^12s}{3:^12s}{4:^12s}{5:^12s}'.format('Epoch', 'Train Acc', 'Train Loss','Valid Acc','Valid_Loss','Duration')
        print (msg)                                                                                    
            
    def on_train_batch_end(self, batch, logs=None):
        acc=logs.get('accuracy')* 100  # get training accuracy 
        loss=logs.get('loss')
        msg='{0:1s}processed batch {1:4s}  training accuracy= {2:8.3f}  loss: {3:8.5f}'.format(' ', str(batch),  acc, loss)
        print(msg, '\r', end='') # prints over on the same line to show running batch count 
        
    def on_epoch_begin(self,epoch, logs=None):
        self.now= time.time()
        
    def on_epoch_end(self,epoch, logs=None): 
        later=time.time()
        duration=later-self.now 
        tacc=logs.get('accuracy')           
        vacc=logs.get('val_accuracy')
        tr_loss=logs.get('loss')
        v_loss=logs.get('val_loss')
        ep=epoch+1
        print(f'{ep:^8.0f} {tacc:^12.2f}{tr_loss:^12.4f}{vacc:^12.2f}{v_loss:^12.4f}{duration:^12.2f}')
        if tacc>= self.train_thold and vacc>= self.valid_thold:
            print( f'\ntraining accuracy and validation accuracy reached the thresholds on epoch {epoch + 1}' )
            self.model.stop_training = True # stop training

如何同时监控损失和 val_loss 以避免神经网络过度拟合训练集或测试集？

问题描述

3 个解决方案

解决方案1
0 2022-08-14 18:12:29

解决方案2
0 2022-08-14 18:19:43

解决方案3
0 2022-08-15 04:55:18

如何同时监控损失和 val_loss 以避免神经网络过度拟合训练集或测试集？

问题描述

3 个解决方案

解决方案1 0 2022-08-14 18:12:29

解决方案2 0 2022-08-14 18:19:43

解决方案3 0 2022-08-15 04:55:18

解决方案1
0 2022-08-14 18:12:29

解决方案2
0 2022-08-14 18:19:43

解决方案3
0 2022-08-15 04:55:18