I set learning rate decay in my optimizer Adam, such as
LR = 1e-3
LR_DECAY = 1e-2
OPTIMIZER = Adam(lr=LR, decay=LR_DECAY)
As the keras document Adam states, after each epoch learning rate would be
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations, K.dtype(self.decay))))
If I understand correctly, learning rate be like this,
lr = lr * 1 / ( 1 + num_epoch * decay)
But I don't see the learning rate decay come into effect, after seeing that printed out. Is there any problem when I use this ?
Edit
I print out the learning by setting the verbose of the ReduceLROnPlateau
, such as,
ReduceLROnPlateau(monitor='val_loss', factor=0.75, patience=Config.REDUCE_LR_PATIENCE, verbose=1, mode='auto', epsilon=0.01, cooldown=0, min_lr=1e-6
And that would monitor the val-loss and reduce the learning rate by multiplying the factor
. The printed learning rate is like this,
Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0007500000356230885.
And I set the initial learning rate to be 1e-3. Therefore, it appears that the learning rate change from 1e-3 to 1e-3 * 0.75, so I doubt that the decay
I set in Adam isn't working.
The learning rate changes with every iteration, ie, with every batch and not epoch. So, if you set the decay = 1e-2 and each epoch has 100 batches/iterations, then after 1 epoch your learning rate will be
lr = init_lr * 1/(1 + 1e-2 * 100)
So, if I want my learning rate to be 0.75 of the original learning rate at the end of each epoch, I would set the lr_decay to
batches_per_epoch = dataset_size/batch_size
lr_decay = (1./0.75 -1)/batches_per_epoch
It seems to work for me. Also, since the new learning rate is calculated at every iteration, the optimizer doesn't change the value of the learning rate variable and always uses the initial learning rate to calculate the effective learning rate.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.