[英]Pytorch backward does not compute the gradients for requested variables
I'm trying to train a resnet18 model on pytorch (+pytorch-lightning) with the use of Virtual Adversarial Training.我正在尝试使用虚拟对抗训练在 pytorch(+pytorch-lightning)上训练 resnet18 model。 During the computations required for this type of training I need to obtain the gradient of D (ie. the cross-entropy loss of the model) with regard to tensor r .在此类训练所需的计算过程中,我需要获得关于张量r的D梯度(即模型的交叉熵损失)。
This should, in theory, happen in the following code snippet:理论上,这应该发生在以下代码片段中:
def generic_step(self, train_batch, batch_idx, step_type):
x, y = train_batch
unlabeled_idx = y is None
d = torch.rand(x.shape).to(x.device)
d = d/(torch.norm(d) + 1e-8)
pred_y = self.classifier(x)
y[unlabeled_idx] = pred_y[unlabeled_idx]
l = self.criterion(pred_y, y)
R_adv = torch.zeros_like(x)
for _ in range(self.ip):
r = self.xi * d
r.requires_grad = True
pred_hat = self.classifier(x + r)
# pred_hat = F.log_softmax(pred_hat, dim=1)
D = self.criterion(pred_hat, pred_y)
self.classifier.zero_grad()
D.requires_grad=True
D.backward()
R_adv += self.eps * r.grad / (torch.norm(r.grad) + 1e-8)
R_adv /= 32
loss = l + R_adv * self.a
loss.backward()
self.accuracy[step_type] = self.acc_metric(torch.argmax(pred_y, 1), y)
return loss
Here, to my understanding, r.grad
should in theory be the gradient of D with respect to r .在这里,据我了解, r.grad
理论上应该是D相对于r的梯度。 However, the code throws this at D.backward()
:然而,代码在D.backward()
抛出这个:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
(full traceback excluded because this error is not helpful and technically "solved" as I know the cause for it, explained just below) RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
完整的回溯,因为这个错误没有帮助,并且在技术上“解决”了,因为我知道它的原因,下面解释)
After some research and debugging it seems that in this situation D.backward()
attempts to calculate dD/dD disregarding any previous mention of requires_grad=True
.经过一些研究和调试后,似乎在这种情况下D.backward()
尝试计算 dD/dD 而不考虑之前提到的requires_grad=True
。 This is confirmed when I add D.requires_grad=True
and I get D.grad=Tensor(1.,device='cuda:0')
but r.grad=None
.当我添加D.requires_grad=True
并得到D.grad=Tensor(1.,device='cuda:0')
但r.grad=None
时,这一点得到了证实。
Does anyone know why this may be happening?有谁知道为什么会发生这种情况?
In Lightning, .backward()
and optimizer step are all handled under the hood.在 Lightning 中, .backward()
和优化器步骤都在后台处理。 If you do it yourself like in the code above, it will mess with Lightning because it doesn't know you called backward yourself.如果你像上面的代码那样自己做,它会弄乱Lightning,因为它不知道你自己向后调用。
You can enable manual optimization in the LightningModule:您可以在 LightningModule 中启用手动优化:
def __init__(self):
super().__init__()
# put this in your init
self.automatic_optimization = False
This tells Lightning that you are taking over calling backward and handling optimizer step + zero grad yourself.这告诉 Lightning,您正在接管向后调用并自己处理优化器步骤 + 零梯度。 Don't forget to add that in your code above.不要忘记在上面的代码中添加它。 You can access the optimizer and scheduler like so in your training step:您可以在训练步骤中像这样访问优化器和调度器:
def training_step(self, batch, batch_idx):
optimizer = self.optimizers()
scheduler = self.lr_schedulers()
# do your training step
# don't forget to call:
# 1) backward 2) optimizer step 3) zero grad
Read more about manual optimization here . 在此处阅读有关手动优化的更多信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.