How is learning rate decay implemented by Adam in keras

Question

I set learning rate decay in my optimizer Adam, such as

LR = 1e-3
LR_DECAY = 1e-2
OPTIMIZER = Adam(lr=LR, decay=LR_DECAY)

As the keras document Adam states, after each epoch learning rate would be

lr = lr * (1. / (1. + self.decay * K.cast(self.iterations, K.dtype(self.decay))))

If I understand correctly, learning rate be like this,

lr = lr * 1 / ( 1 + num_epoch * decay)

But I don't see the learning rate decay come into effect, after seeing that printed out. Is there any problem when I use this ?

Edit
I print out the learning by setting the verbose of the ReduceLROnPlateau , such as,

ReduceLROnPlateau(monitor='val_loss', factor=0.75, patience=Config.REDUCE_LR_PATIENCE, verbose=1, mode='auto', epsilon=0.01, cooldown=0, min_lr=1e-6

And that would monitor the val-loss and reduce the learning rate by multiplying the factor . The printed learning rate is like this,

Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0007500000356230885.

And I set the initial learning rate to be 1e-3. Therefore, it appears that the learning rate change from 1e-3 to 1e-3 * 0.75, so I doubt that the decay I set in Adam isn't working.

Answer 1

The learning rate changes with every iteration, ie, with every batch and not epoch. So, if you set the decay = 1e-2 and each epoch has 100 batches/iterations, then after 1 epoch your learning rate will be

lr = init_lr * 1/(1 + 1e-2 * 100)

So, if I want my learning rate to be 0.75 of the original learning rate at the end of each epoch, I would set the lr_decay to

batches_per_epoch = dataset_size/batch_size
lr_decay = (1./0.75 -1)/batches_per_epoch

It seems to work for me. Also, since the new learning rate is calculated at every iteration, the optimizer doesn't change the value of the learning rate variable and always uses the initial learning rate to calculate the effective learning rate.

How is learning rate decay implemented by Adam in keras

Question

1 answers

solution1
3 2019-11-07 09:47:16

How is learning rate decay implemented by Adam in keras

Question

1 answers

solution1 3 2019-11-07 09:47:16

solution1
3 2019-11-07 09:47:16