简体   繁体   English

如何在 tensorflow 自定义训练循环中考虑 l1 和 l2 正则化器?

[英]How do I take l1 and l2 regularizers into account in tensorflow custom training loops?

While playing with model.fit_on_batch method and custom training loops I realized that in the custom training loop code the loss and gradient do not take into account any l1-l2 regularizers and hence optimizer.apply_gradients method does not take the regularizers into account.在使用 model.fit_on_batch 方法和自定义训练循环时,我意识到在自定义训练循环代码中,损失和梯度没有考虑任何 l1-l2 正则化器,因此 optimizer.apply_gradients 方法没有考虑正则化器。 Below you can find the code to show this but the idea is pretty simple.您可以在下面找到显示这一点的代码,但这个想法非常简单。 So my questions is if there is a method to use all these optimizers in optimizer detail agnostic way to take the regularizers into account.所以我的问题是,是否有一种方法可以以与优化器细节无关的方式使用所有这些优化器来考虑正则化器。 How is it implemented in Keras?它是如何在 Keras 中实现的? On a related note, model.fit_on_batch returns a value that it not the loss (as claimed in the docstring) but something else.在相关说明中, model.fit_on_batch 返回一个值,它不是损失(如文档字符串中所述)而是其他值。 I was wondering if someone here knows what it returns.我想知道这里是否有人知道它会返回什么。

Code代码

To see this effect first create some data要查看此效果,请先创建一些数据

x=tf.constant([[1]])
y=tf.constant([[1]])

and create a function to make a reproducible model并创建 function 以制作可重现的 model

def make_model(l1=.01,l2=.01):
    tf.random.set_seed(42)
    np.random.seed(42)
    model=tf.keras.models.Sequential([
        tf.keras.layers.Dense(2,'softmax',
                              use_bias=False,
                              kernel_regularizer=tf.keras.regularizers.l1_l2(l1=l1,l2=l2),
                              input_shape=(1,))
    ])
    return model

Now run Keras train_on_batch现在运行 Keras train_on_batch

model=make_model()
loss_object=tf.keras.losses.SparseCategoricalCrossentropy()
optimizer=tf.keras.optimizers.RMSprop()
model.compile(loss=loss_object,optimizer=optimizer)
model.train_on_batch(x,y)

and compare the outputs with the custom training loop as explained in the above link as well as here并将输出与自定义训练循环进行比较,如上述链接和此处所述

model=make_model()
loss_object=tf.keras.losses.SparseCategoricalCrossentropy()
optimizer=tf.keras.optimizers.RMSprop()

@tf.function
def train_step(x,y):

    with tf.GradientTape() as tape:
        predictions  = model(x)
        loss = loss_object(y, predictions)

    gradients = tape.gradient(loss, model.trainable_variables)    
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

train_step(x,y).numpy()

You will see the two results are different unless l1==0 and l2==0.你会看到这两个结果是不同的,除非 l1==0 和 l2==0。

Actually I found out the answer in Aurelien Geron's book其实我在 Aurelien Geron 的书中找到了答案在此处输入图像描述

In fact after I implemented the code below, I found that this is covered in the tensorflow guide on custom training (I don't know why its not in the tutorials mentioned in the question since its an important point).事实上,在我实现了下面的代码之后,我发现tensorflow 自定义培训指南中涵盖了这一点(我不知道为什么它不在问题中提到的教程中,因为它很重要)。 The solution in there is more general than the one mentioned here but I am keeping this as it sheds a bit more light on whats happening.那里的解决方案比这里提到的解决方案更通用,但我保留了这一点,因为它对正在发生的事情有更多的了解。

So it is as simple as modifying the custom training loop to所以就像修改自定义训练循环一样简单

def add_model_regularizer_loss(model):
    loss=0
    for l in model.layers:
        if hasattr(l,'layers') and l.layers: # the layer itself is a model
            loss+=add_model_loss(l)
        if hasattr(l,'kernel_regularizer') and l.kernel_regularizer:
            loss+=l.kernel_regularizer(l.kernel)
        if hasattr(l,'bias_regularizer') and l.bias_regularizer:
            loss+=l.bias_regularizer(l.bias)
    return loss

def train_step(x,y):

    with tf.GradientTape() as tape:
        predictions  = model(x)
        loss = loss_object(y, predictions)
        loss += add_model_regularizer_loss(model)

    gradients = tape.gradient(loss, model.trainable_variables)    
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

To answer the second part of my question, it is this loss value that keras's model fit method returns.要回答我问题的第二部分,keras 的 model 拟合方法返回的正是这个损失值。

The reccomended practice, as stated on the TF site is to use model.losses .TF 网站所述,推荐的做法是使用model.losses For example:例如:

def train_step(x,y):

    with tf.GradientTape() as tape:
        predictions  = model(x)
        loss = loss_object(y, predictions)
        loss += tf.add_n(model.losses)   # <--- SEE HERE

    gradients = tape.gradient(loss, model.trainable_variables)    
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM