How do I take l1 and l2 regularizers into account in tensorflow custom training loops?

Question

While playing with model.fit_on_batch method and custom training loops I realized that in the custom training loop code the loss and gradient do not take into account any l1-l2 regularizers and hence optimizer.apply_gradients method does not take the regularizers into account. Below you can find the code to show this but the idea is pretty simple. So my questions is if there is a method to use all these optimizers in optimizer detail agnostic way to take the regularizers into account. How is it implemented in Keras? On a related note, model.fit_on_batch returns a value that it not the loss (as claimed in the docstring) but something else. I was wondering if someone here knows what it returns.

Code

To see this effect first create some data

x=tf.constant([[1]])
y=tf.constant([[1]])

and create a function to make a reproducible model

def make_model(l1=.01,l2=.01):
    tf.random.set_seed(42)
    np.random.seed(42)
    model=tf.keras.models.Sequential([
        tf.keras.layers.Dense(2,'softmax',
                              use_bias=False,
                              kernel_regularizer=tf.keras.regularizers.l1_l2(l1=l1,l2=l2),
                              input_shape=(1,))
    ])
    return model

Now run Keras train_on_batch

model=make_model()
loss_object=tf.keras.losses.SparseCategoricalCrossentropy()
optimizer=tf.keras.optimizers.RMSprop()
model.compile(loss=loss_object,optimizer=optimizer)
model.train_on_batch(x,y)

and compare the outputs with the custom training loop as explained in the above link as well as here

model=make_model()
loss_object=tf.keras.losses.SparseCategoricalCrossentropy()
optimizer=tf.keras.optimizers.RMSprop()

@tf.function
def train_step(x,y):

    with tf.GradientTape() as tape:
        predictions  = model(x)
        loss = loss_object(y, predictions)

    gradients = tape.gradient(loss, model.trainable_variables)    
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

train_step(x,y).numpy()

You will see the two results are different unless l1==0 and l2==0.

Answer 1

Actually I found out the answer in Aurelien Geron's book

In fact after I implemented the code below, I found that this is covered in the tensorflow guide on custom training (I don't know why its not in the tutorials mentioned in the question since its an important point). The solution in there is more general than the one mentioned here but I am keeping this as it sheds a bit more light on whats happening.

So it is as simple as modifying the custom training loop to

def add_model_regularizer_loss(model):
    loss=0
    for l in model.layers:
        if hasattr(l,'layers') and l.layers: # the layer itself is a model
            loss+=add_model_loss(l)
        if hasattr(l,'kernel_regularizer') and l.kernel_regularizer:
            loss+=l.kernel_regularizer(l.kernel)
        if hasattr(l,'bias_regularizer') and l.bias_regularizer:
            loss+=l.bias_regularizer(l.bias)
    return loss

def train_step(x,y):

    with tf.GradientTape() as tape:
        predictions  = model(x)
        loss = loss_object(y, predictions)
        loss += add_model_regularizer_loss(model)

    gradients = tape.gradient(loss, model.trainable_variables)    
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

To answer the second part of my question, it is this loss value that keras's model fit method returns.

Answer 2

The reccomended practice, as stated on the TF site is to use model.losses . For example:

def train_step(x,y):

    with tf.GradientTape() as tape:
        predictions  = model(x)
        loss = loss_object(y, predictions)
        loss += tf.add_n(model.losses)   # <--- SEE HERE

    gradients = tape.gradient(loss, model.trainable_variables)    
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

How do I take l1 and l2 regularizers into account in tensorflow custom training loops?

Question

2 answers

solution1
5 ACCPTED 2020-06-18 00:25:22

solution2
0 2022-02-03 22:05:18

How do I take l1 and l2 regularizers into account in tensorflow custom training loops?

Question

2 answers

solution1 5 ACCPTED 2020-06-18 00:25:22

solution2 0 2022-02-03 22:05:18

solution1
5 ACCPTED 2020-06-18 00:25:22

solution2
0 2022-02-03 22:05:18