Pytorch 向后不计算请求变量的梯度

Question

I'm trying to train a resnet18 model on pytorch (+pytorch-lightning) with the use of Virtual Adversarial Training.我正在尝试使用虚拟对抗训练在 pytorch（+pytorch-lightning）上训练 resnet18 model。 During the computations required for this type of training I need to obtain the gradient of D (ie. the cross-entropy loss of the model) with regard to tensor r .在此类训练所需的计算过程中，我需要获得关于张量r的D梯度（即模型的交叉熵损失）。

This should, in theory, happen in the following code snippet:理论上，这应该发生在以下代码片段中：

def generic_step(self, train_batch, batch_idx, step_type):
    x, y = train_batch
    unlabeled_idx = y is None

    d = torch.rand(x.shape).to(x.device)
    d = d/(torch.norm(d) + 1e-8)

    pred_y = self.classifier(x)
    y[unlabeled_idx] = pred_y[unlabeled_idx]
    l = self.criterion(pred_y, y)
    R_adv = torch.zeros_like(x)
    for _ in range(self.ip):
        r = self.xi * d
        r.requires_grad = True
        pred_hat = self.classifier(x + r)
        # pred_hat = F.log_softmax(pred_hat, dim=1)
        D = self.criterion(pred_hat, pred_y)
        self.classifier.zero_grad()
        D.requires_grad=True
        D.backward()
        R_adv += self.eps * r.grad / (torch.norm(r.grad) + 1e-8)

    R_adv /= 32
    loss = l + R_adv * self.a
    loss.backward()
    self.accuracy[step_type] = self.acc_metric(torch.argmax(pred_y, 1), y)
    return loss

Here, to my understanding, r.grad should in theory be the gradient of D with respect to r .在这里，据我了解， r.grad理论上应该是D相对于r的梯度。 However, the code throws this at D.backward() :然而，代码在D.backward()抛出这个：

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn (full traceback excluded because this error is not helpful and technically "solved" as I know the cause for it, explained just below) RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn完整的回溯，因为这个错误没有帮助，并且在技术上“解决”了，因为我知道它的原因，下面解释）

After some research and debugging it seems that in this situation D.backward() attempts to calculate dD/dD disregarding any previous mention of requires_grad=True .经过一些研究和调试后，似乎在这种情况下D.backward()尝试计算 dD/dD 而不考虑之前提到的requires_grad=True 。 This is confirmed when I add D.requires_grad=True and I get D.grad=Tensor(1.,device='cuda:0') but r.grad=None .当我添加D.requires_grad=True并得到D.grad=Tensor(1.,device='cuda:0')但r.grad=None时，这一点得到了证实。

Does anyone know why this may be happening?有谁知道为什么会发生这种情况？

Answer 1

In Lightning, .backward() and optimizer step are all handled under the hood.在 Lightning 中， .backward()和优化器步骤都在后台处理。 If you do it yourself like in the code above, it will mess with Lightning because it doesn't know you called backward yourself.如果你像上面的代码那样自己做，它会弄乱Lightning，因为它不知道你自己向后调用。

You can enable manual optimization in the LightningModule:您可以在 LightningModule 中启用手动优化：

def __init__(self):
    super().__init__()

    # put this in your init
    self.automatic_optimization = False

This tells Lightning that you are taking over calling backward and handling optimizer step + zero grad yourself.这告诉 Lightning，您正在接管向后调用并自己处理优化器步骤 + 零梯度。 Don't forget to add that in your code above.不要忘记在上面的代码中添加它。 You can access the optimizer and scheduler like so in your training step:您可以在训练步骤中像这样访问优化器和调度器：

def training_step(self, batch, batch_idx):

    optimizer = self.optimizers()
    scheduler = self.lr_schedulers()

    # do your training step
    # don't forget to call:
    # 1) backward 2) optimizer step 3) zero grad

Read more about manual optimization here . 在此处阅读有关手动优化的更多信息。

Pytorch 向后不计算请求变量的梯度

问题描述

1 个解决方案

解决方案1
0 2022-09-03 19:41:13

Pytorch 向后不计算请求变量的梯度

问题描述

1 个解决方案

解决方案1 0 2022-09-03 19:41:13

解决方案1
0 2022-09-03 19:41:13