Keras：使用 Adadelta 优化器时学习率如何变化？

Question

For example I use Adadelta for optimizer when compile network model, then learning rate will change in time by this rule (but what is iterations ? ) and how can I log learning rate value to console?例如，我在编译网络模型时使用Adadelta作为优化器，然后learning rate将根据此规则及时改变（但什么是iterations ？）以及如何将学习率值记录到控制台？

model.compile(loss=keras.losses.mean_squared_error,
        optimizer= keras.optimizers.Adadelta())

In documentation lr is just starting learning rate?在文档中lr刚刚开始学习率？

Answer 1

The rule is related to updates with decay.该规则与衰减更新有关。 Adadelta is an adaptive learning rate method which uses exponentially decaying average of gradients. Adadelta 是一种自适应学习率方法，它使用梯度的指数衰减平均值。

Answer 2

Looking at Keras source code , learning rate is recalculated based on decay like:查看Keras 源代码，学习率是根据衰减重新计算的，例如：

lr = self.lr
if self.initial_decay > 0:
    lr *= (1. / (1. + self.decay * K.cast(self.iterations, K.dtype(self.decay))))

So yes, lr is just starting learning rate.所以是的， lr刚刚开始学习率。

To print it after every epoch, as @orabis mentioned, you can make a callback class:要在每个时代之后打印它，正如@orabis 提到的，您可以创建一个回调类：

class YourLearningRateTracker(Callback):
    def on_epoch_end(self, epoch, logs=None):
        lr = self.model.optimizer.lr
        decay = self.model.optimizer.decay
        iterations = self.model.optimizer.iterations
        lr_with_decay = lr / (1. + decay * K.cast(iterations, K.dtype(decay)))
        print(K.eval(lr_with_decay))

and then add its instance to the callbacks when calling model.fit() like:然后在调用model.fit()时将其实例添加到回调中，例如：

model.fit(..., callbacks=[YourLearningRateTracker()])

However, note that, by default, decay parameter for Adadelta is zero and is not part of the “standard” arguments, so your learning rate would not be changing its value when using default arguments.但是，请注意，默认情况下，Adadelta 的decay参数为零，并且不是“标准”参数的一部分，因此在使用默认参数时，您的学习率不会改变其值。 I suspect that decay is not intended to be used with Adadelta.我怀疑衰减不打算与 Adadelta 一起使用。

On the other hand, rho parameter, which is nonzero by default, doesn't describe the decay of the learning rate, but corresponds to the fraction of gradient to keep at each time step (according to the Keras documentation ).另一方面，默认情况下非零的rho参数不描述学习率的衰减，而是对应于每个时间步保持的梯度分数（根据Keras 文档）。

I found some relevant information on this Github issue , and by asking a similar question .我发现了一些关于这个 Github 问题的相关信息，并通过问一个类似的问题。

Keras：使用 Adadelta 优化器时学习率如何变化？

问题描述

2 个解决方案

解决方案1
3 2017-05-18 15:48:25

解决方案2
1 2020-05-08 08:16:07

Keras：使用 Adadelta 优化器时学习率如何变化？

问题描述

2 个解决方案

解决方案1 3 2017-05-18 15:48:25

解决方案2 1 2020-05-08 08:16:07

解决方案1
3 2017-05-18 15:48:25

解决方案2
1 2020-05-08 08:16:07