简体   繁体   English

Tensorflow SGDW 优化器中的学习率和权重衰减时间表

[英]Learning rate and weight decay schedule in Tensorflow SGDW optimizer

I'm trying to reproduce part of this paper with TensorFlow, the problem is that the authors use SGD with weight decay, cutting the learning rate to 1/10 every 30 epochs.我试图用 TensorFlow 复制本文的一部分,问题是作者使用带有权重衰减的 SGD,每 30 个时期将学习率降低到 1/10。

TensorFlow documentation says that TensorFlow 文档

when applying a decay to the learning rate, be sure to manually apply the decay to the weight_decay as well当对学习率应用衰减时,请务必手动将衰减应用于 weight_decay

So I tried with所以我尝试了

schedule = keras.optimizers.schedules.ExponentialDecay(initial_learning_rate = 0.003,
                                                   decay_rate = 0.1,
                                                   decay_steps = steps_per_epoch*30,
                                                   staircase = True
)
optimizer = tfa.optimizers.SGDW(learning_rate = schedule,
                            weight_decay = schedule,
                            momentum = 0.9
)

(steps_per_epoch previously initialized) (steps_per_epoch 之前已初始化)
As I would with Keras SGD, this however doesn't work and raises a "TypeError: Expected float32" for the decay_weight parameter.正如我对 Keras SGD 所做的那样,但这不起作用并为衰变重量参数引发“TypeError:预期的 float32”。 What's the correct way to achieve the target behaviour?实现目标行为的正确方法是什么?

You are getting an error because you are using keras ExponentialDecay inside tensorflow add-on optimizer SGDW .您收到错误是因为您在 tensorflow 附加优化器SGDW内使用 keras ExponentialDecay

As per the paper hyper-parameters are根据论文的超参数是

  1. weight decay of 0.001重量衰减 0.001
  2. momentum of 0.9 0.9 的动量
  3. starting learning rate is 0.003 which is reduced by a factor of 10 after 30 epochs起始学习率为 0.003,在 30 个 epoch 后降低了 10 倍

So why not use LearningRateScheduler to reduce it by factor to 10 evey 30 epochs那么为什么不使用LearningRateScheduler将其减少到 10 evey 30 epochs

Sample Code示例代码

model = tf.keras.Sequential([
    tf.keras.layers.Dense(8, input_shape=(10,)),
    tf.keras.layers.Dense(4, activation='relu'),
    tf.keras.layers.Dense(3, activation='softmax'),
])

X = np.random.randn(10,10)
y = np.random.randint(0,4,(10,3))

model.compile(
    optimizer=tfa.optimizers.SGDW(
        weight_decay=0.001,
        momentum=0.9,
        learning_rate=0.003),
      loss=tf.keras.losses.categorical_crossentropy)

def scheduler(epoch, lr):
  if epoch  % 30 == 0:
    lr = lr*0.1  
  return lr

callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
model.fit(X, y, callbacks=[callback], epochs=100)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从中获取学习率<tensorflow.python.keras.optimizer_v2.learning_rate_schedule.cosinedecay> Object </tensorflow.python.keras.optimizer_v2.learning_rate_schedule.cosinedecay> - Get Learning Rate from <tensorflow.python.keras.optimizer_v2.learning_rate_schedule.CosineDecay> Object 使用 AdamW 的衰减学习率计划如何影响权重衰减参数? - How does a decaying learning rate schedule with AdamW influence the weight decay parameter? 为什么 Adadelta 优化器不会降低学习率? - Why doesn't the Adadelta optimizer decay the learning rate? 千层面的学习率衰减 - Learning Rate Decay in Lasagne 在 tensorflow 中正确设置学习率的指数衰减 - Properly set up exponential decay of learning rate in tensorflow TensorFlow中的学习率衰减-piecewise_constant函数出错 - Learning rate decay in TensorFlow - Error with piecewise_constant function 在Keras-to-TPU模型中使用张量流学习率衰减 - Use tensorflow learning-rate decay in a Keras-to-TPU model 在 Keras 中使用带有 Adam 优化器的“学习率步长衰减”调度器会造成 Nan 损失吗? - Nan losses using "Learning Rate Step Decay" Scheduler with Adam Optimizer in Keras? 修改 Tensorflow (Keras) Optimizer(用于 Layerwise Learning Rate Multipliers) - Modify Tensorflow (Keras) Optimizer (for Layerwise Learning Rate Multipliers) 如何在 TensorFlow 2.0 中降低 SGD 优化器的学习率? - How to get reduced learning rate of SGD optimizer in TensorFlow 2.0?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM