简体   繁体   English

Pytorch 向后不计算请求变量的梯度

[英]Pytorch backward does not compute the gradients for requested variables

I'm trying to train a resnet18 model on pytorch (+pytorch-lightning) with the use of Virtual Adversarial Training.我正在尝试使用虚拟对抗训练在 pytorch(+pytorch-lightning)上训练 resnet18 model。 During the computations required for this type of training I need to obtain the gradient of D (ie. the cross-entropy loss of the model) with regard to tensor r .在此类训练所需的计算过程中,我需要获得关于张量rD梯度(即模型的交叉熵损失)。

This should, in theory, happen in the following code snippet:理论上,这应该发生在以下代码片段中:

def generic_step(self, train_batch, batch_idx, step_type):
    x, y = train_batch
    unlabeled_idx = y is None

    d = torch.rand(x.shape).to(x.device)
    d = d/(torch.norm(d) + 1e-8)

    pred_y = self.classifier(x)
    y[unlabeled_idx] = pred_y[unlabeled_idx]
    l = self.criterion(pred_y, y)
    R_adv = torch.zeros_like(x)
    for _ in range(self.ip):
        r = self.xi * d
        r.requires_grad = True
        pred_hat = self.classifier(x + r)
        # pred_hat = F.log_softmax(pred_hat, dim=1)
        D = self.criterion(pred_hat, pred_y)
        self.classifier.zero_grad()
        D.requires_grad=True
        D.backward()
        R_adv += self.eps * r.grad / (torch.norm(r.grad) + 1e-8)

    R_adv /= 32
    loss = l + R_adv * self.a
    loss.backward()
    self.accuracy[step_type] = self.acc_metric(torch.argmax(pred_y, 1), y)
    return loss

Here, to my understanding, r.grad should in theory be the gradient of D with respect to r .在这里,据我了解, r.grad理论上应该是D相对于r的梯度。 However, the code throws this at D.backward() :然而,代码在D.backward()抛出这个:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn (full traceback excluded because this error is not helpful and technically "solved" as I know the cause for it, explained just below) RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn完整的回溯,因为这个错误没有帮助,并且在技术上“解决”了,因为我知道它的原因,下面解释)

After some research and debugging it seems that in this situation D.backward() attempts to calculate dD/dD disregarding any previous mention of requires_grad=True .经过一些研究和调试后,似乎在这种情况下D.backward()尝试计算 dD/dD 而不考虑之前提到的requires_grad=True This is confirmed when I add D.requires_grad=True and I get D.grad=Tensor(1.,device='cuda:0') but r.grad=None .当我添加D.requires_grad=True并得到D.grad=Tensor(1.,device='cuda:0')r.grad=None时,这一点得到了证实。

Does anyone know why this may be happening?有谁知道为什么会发生这种情况?

In Lightning, .backward() and optimizer step are all handled under the hood.在 Lightning 中, .backward()和优化器步骤都在后台处理。 If you do it yourself like in the code above, it will mess with Lightning because it doesn't know you called backward yourself.如果你像上面的代码那样自己做,它会弄乱Lightning,因为它不知道你自己向后调用。

You can enable manual optimization in the LightningModule:您可以在 LightningModule 中启用手动优化:

def __init__(self):
    super().__init__()

    # put this in your init
    self.automatic_optimization = False

This tells Lightning that you are taking over calling backward and handling optimizer step + zero grad yourself.这告诉 Lightning,您正在接管向后调用并自己处理优化器步骤 + 零梯度。 Don't forget to add that in your code above.不要忘记在上面的代码中添加它。 You can access the optimizer and scheduler like so in your training step:您可以在训练步骤中像这样访问优化器和调度器:

def training_step(self, batch, batch_idx):

    optimizer = self.optimizers()
    scheduler = self.lr_schedulers()

    # do your training step
    # don't forget to call:
    # 1) backward 2) optimizer step 3) zero grad

Read more about manual optimization here . 在此处阅读有关手动优化的更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pytorch如何计算简单线性回归模型的梯度? - How does pytorch compute the gradients for a simple linear regression model? PyTorch 在优化三元组损失时如何计算反向传递? - How does PyTorch compute the backward pass when optimizing triplet loss? Pytorch-创建新变量时是否会转移渐变? - Pytorch - Are gradients transferred on creation of new Variables? 如何在Tensorflow向后传播期间获得不可训练变量的梯度 - How to get the gradients of non trainable variables during the backward propagation in Tensorflow 任何 Tensorflow 相当于 Pytorch 的 backward()? 尝试将梯度发送回 TF model 以进行反向传播 - Any Tensorflow equivalent of Pytorch's backward()? Trying to send gradients back to TF model to backprop Tensorflow GradientTape“渐变不存在变量”间歇性 - Tensorflow GradientTape "Gradients does not exist for variables" intermittently 这是计算 pytorch 中两个不同 NN 的两个损失的梯度的正确方法吗? - Is this the right way to compute gradients of two losses from two different NN's in pytorch? pytorch 如何计算简单函数的导数? - How does pytorch compute derivatives for simple functions? pytorch中的``无''渐变 - 'None' gradients in pytorch pytorch梯度尚未计算 - Pytorch gradients has not calculated
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM