[英]What is the relation between a learning rate scheduler and an optimizer?
If I have a model:如果我有 model:
import torch
import torch.nn as nn
import torch.optim as optim
class net_x(nn.Module):
def __init__(self):
super(net_x, self).__init__()
self.fc1=nn.Linear(2, 20)
self.fc2=nn.Linear(20, 20)
self.out=nn.Linear(20, 4)
def forward(self, x):
x=self.fc1(x)
x=self.fc2(x)
x=self.out(x)
return x
nx = net_x()
And then I'm defining my inputs, optimizer (with lr=0.1
), scheduler (with base_lr=1e-3
), and training:然后我定义我的输入、优化器(使用
lr=0.1
)、调度器(使用base_lr=1e-3
)和培训:
r = torch.tensor([1.0,2.0])
optimizer = optim.Adam(nx.parameters(), lr = 0.1)
scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=1e-3, max_lr=0.1, step_size_up=1, mode="triangular2", cycle_momentum=False)
path = 'opt.pt'
for epoch in range(10):
optimizer.zero_grad()
net_predictions = nx(r)
loss = torch.sum(torch.randint(0,10,(4,)) - net_predictions)
loss.backward()
optimizer.step()
scheduler.step()
print('loss:' , loss)
#save state dict
torch.save({ 'epoch': epoch,
'net_x_state_dict': nx.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'scheduler': scheduler.state_dict(),
}, path)
#loading state dict
checkpoint = torch.load(path)
nx.load_state_dict(checkpoint['net_x_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
scheduler.load_state_dict(checkpoint['scheduler'])
The optimizer seems to take the learning rate of the scheduler优化器似乎采用了调度器的学习率
for g in optimizer.param_groups:
print(g)
>>>
{'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False, 'initial_lr': 0.001, 'params': [Parameter containing:
Does the learning rate scheduler overwrite the optimizer?学习率调度器会覆盖优化器吗? How does it connect to it?
它如何连接到它? Trying to understand the relation between them (ie how they interact, etc.)
试图了解它们之间的关系(即它们如何相互作用等)
TL;DR: The LR scheduler contains the optimizer as a member and alters its parameters learning rates explicitly. TL;DR: LR 调度程序包含优化器作为成员,并显式更改其参数学习率。
As mentioned in PyTorch Official Documentations , the learning rate scheduler receives the optimizer as a parameter in its constructor, and thus has access to its parameters.如PyTorch 官方文档中所述,学习率调度程序在其构造函数中接收优化器作为参数,因此可以访问其参数。
The common use is to update the LR after every epoch:常见的用途是在每个 epoch 之后更新 LR:
scheduler = ... # initialize some LR scheduler
for epoch in range(100):
train(...) # here optimizer.step() is called numerous times.
validate(...)
scheduler.step()
All optimizers inherit from a common parent class torch.nn.Optimizer
and are updated using the step
method implemented for each of them.所有优化器都从一个共同的父 class
torch.nn.Optimizer
继承,并使用为每个优化器实现的step
方法进行更新。
Similarly, all LR schedulers (besides ReduceLROnPlateau
) inherit from a common parent class named _LRScheduler
.类似地,所有 LR 调度程序(除了
ReduceLROnPlateau
)都从一个名为_LRScheduler
的公共父 class 继承。 Observing its source code uncovers that in the step
method the class indeed changes the LR of the parameters of the optimizer:观察其源代码会发现,在
step
方法中,class 确实改变了优化器参数的 LR:
...
for i, data in enumerate(zip(self.optimizer.param_groups, values)):
param_group, lr = data
param_group['lr'] = lr
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.