简体   繁体   English

Pytorch:有没有办法在使用调度程序时实现逐层学习率衰减?

[英]Pytorch: Is there a way to implement layer-wise learning rate decay when using a Scheduler?

I want to implement the layer-wise learning rate decay while still using a Scheduler.我想在仍然使用调度程序的同时实现逐层学习率衰减。 Specifically, what I currently have is:具体来说,我目前拥有的是:

model = Model()
optim = optim.Adam(lr=0.1)
scheduler = optim.lr_scheduler.OneCycleLR(optim, max_lr=0.1)

Then, the learning rate is increased to 0.1 in the first 30% of the epochs and gradually decays over time.然后,学习率在前 30% 的 epoch 中增加到0.1 ,并随着时间的推移逐渐衰减。 I want to further add this with layer-wise Learning rate decay.我想通过分层学习率衰减进一步添加它。

This tutorial is something that I want to implement, but it uses a fixed LR instead of changing LR like when used with a Scheduler. 本教程是我想要实现的东西,但它使用固定的 LR ,而不是像与调度程序一起使用时那样更改 LR。 What I want is at every step, the model still uses the LR it gets from the optimizer, but then every layer's LR is also decayed by a factor .我想要的是在每一步,model 仍然使用它从优化器获得的 LR,但是每一层的 LR 也被一个因子衰减 It goes like:它是这样的:

for i in range(steps):
    lr = scheduler.get_last_lr()
    for idx, layer in enumerate(model.layers()):
        layer['lr'] = lr * 0.9 ** (idx+1)
    output = model(input)
    ...

However, when using this, do I have to pass the model.parameters() to the optimizer again?但是,使用它时,是否必须再次将model.parameters()传递给优化器? How will the LR be computed in this scenario?在这种情况下将如何计算 LR? Is there a better way to do this?有一个更好的方法吗?

Also I am looking for a way to do that for very large models where listing all layers and specifying LRs for each of them is a bit exhaustive.此外,我正在寻找一种方法来为非常大的模型执行此操作,其中列出所有层并为每个层指定 LR 有点详尽无遗。

If you want to do something that's not a pain vanilla, pytorch-preimplemented schedule for your learning rate, I recommend foregoing the pytorch scheduler class and manually adjusting the learning rates for each of the parameter groups yourself.如果你想做一些不是痛苦的香草,pytorch 预实现的学习率计划,我建议放弃 pytorch scheduler class 并自己手动调整每个参数组的学习率。 You can directly access the learning rates as seen here similar to your code above but accessing the optimizer parameter group handles rather than the model layers directly:您可以直接访问此处看到的学习率,类似于上面的代码,但访问优化器参数组句柄而不是直接访问 model 层:

for group in optim.param_groups:
    group["lr"] *= 0.9      # for example

From here, you can either use a list of decay factors or else a dictionary keyed by the parameter group names to make this concise.从这里开始,您可以使用衰减因子列表或以参数组名称为关键字的字典来使其简洁。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM