[英]Keras: how learning rate changes when Adadelta optimizer is used?
For example I use Adadelta
for optimizer when compile network model, then learning rate
will change in time by this rule (but what is iterations
? ) and how can I log learning rate value to console?例如,我在编译网络模型时使用
Adadelta
作为优化器,然后learning rate
将根据此规则及时改变(但什么是iterations
?)以及如何将学习率值记录到控制台?
model.compile(loss=keras.losses.mean_squared_error,
optimizer= keras.optimizers.Adadelta())
In documentation lr
is just starting learning rate?在文档中
lr
刚刚开始学习率?
The rule is related to updates with decay.该规则与衰减更新有关。 Adadelta is an adaptive learning rate method which uses exponentially decaying average of gradients.
Adadelta 是一种自适应学习率方法,它使用梯度的指数衰减平均值。
Looking at Keras source code , learning rate is recalculated based on decay like:查看Keras 源代码,学习率是根据衰减重新计算的,例如:
lr = self.lr
if self.initial_decay > 0:
lr *= (1. / (1. + self.decay * K.cast(self.iterations, K.dtype(self.decay))))
So yes, lr
is just starting learning rate.所以是的,
lr
刚刚开始学习率。
To print it after every epoch, as @orabis mentioned, you can make a callback class:要在每个时代之后打印它,正如@orabis 提到的,您可以创建一个回调类:
class YourLearningRateTracker(Callback):
def on_epoch_end(self, epoch, logs=None):
lr = self.model.optimizer.lr
decay = self.model.optimizer.decay
iterations = self.model.optimizer.iterations
lr_with_decay = lr / (1. + decay * K.cast(iterations, K.dtype(decay)))
print(K.eval(lr_with_decay))
and then add its instance to the callbacks when calling model.fit()
like:然后在调用
model.fit()
时将其实例添加到回调中,例如:
model.fit(..., callbacks=[YourLearningRateTracker()])
However, note that, by default, decay
parameter for Adadelta is zero and is not part of the “standard” arguments, so your learning rate would not be changing its value when using default arguments.但是,请注意,默认情况下,Adadelta 的
decay
参数为零,并且不是“标准”参数的一部分,因此在使用默认参数时,您的学习率不会改变其值。 I suspect that decay is not intended to be used with Adadelta.我怀疑衰减不打算与 Adadelta 一起使用。
On the other hand, rho
parameter, which is nonzero by default, doesn't describe the decay of the learning rate, but corresponds to the fraction of gradient to keep at each time step (according to the Keras documentation ).另一方面,默认情况下非零的
rho
参数不描述学习率的衰减,而是对应于每个时间步保持的梯度分数(根据Keras 文档)。
I found some relevant information on this Github issue , and by asking a similar question .我发现了一些关于这个 Github 问题的相关信息,并通过问一个类似的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.