简体   繁体   English

PyTorch 用于回归的 Autograd

[英]PyTorch Autograd for Regression

another PyTorch newbie here trying to understand their computational graph and autograd.另一个 PyTorch 新手在这里试图了解他们的计算图和 autograd。

I'm learning the following model on potential energy and corresponding force.我正在学习以下关于势能和相应力的 model。

model = nn.Sequential(
    nn.Linear(1, 32),
    nn.Linear(32, 32), nn.Tanh(),
    nn.Linear(32, 32), nn.Tanh(),
    nn.Linear(32, 1)
)

optimizer = torch.optim.Adam(model.parameters())
loss = nn.MSELoss()
# generate data
r = torch.linspace(0.95, 3, 50, requires_grad=True).view(-1, 1)
E = 1 / r
F = -grad(E.sum(), r)[0]

inputs = r

for epoch in range(10**3):
    E_pred = model.forward(inputs)
    F_pred = -grad(E_pred.sum(), r, create_graph=True, retain_graph=True)[0]

    optimizer.zero_grad()
    error = loss(E_pred, E.data) + loss(F_pred, F.data)
    error.backward()
    optimizer.step()

However, if I change the inputs = r to inputs = 1*r , the training loop breaks and gives the following error但是,如果我将inputs = r更改为inputs = 1*r ,则训练循环中断并出现以下错误

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Could you please explain why this happens?你能解释一下为什么会这样吗?

This error occurs when backward is executed after backward.在backward之后执行backward时会发生此错误。 Here is the example code.这是示例代码。

output = model.forward(x)
loss = criterion(label, output)

optimizer.zero_grad()
loss.backward()

loss2 = criterion(loss, output2)
loss2.backward()

optimizer.step()

And as you can see in the following code, if you just put r in inputs, a shallow copy occurs.正如您在以下代码中看到的,如果您只是将 r 放入输入中,则会发生浅拷贝。 Therefore, when the value of r changes, the value of inputs also changes.因此,当 r 的值发生变化时,输入的值也会发生变化。 However, if multiplied by 1, it becomes a deep copy and the value does not change even if r is changed.但是,如果乘以 1,则成为深拷贝,即使 r 发生变化,值也不会变化。

r = torch.linspace(0.95, 3, 50).view(-1, 1)

inputs_1 = r
inputs_2 = 1 * r
r[0] = 100

print(inputs_1)
print(inputs_2)

And the requires grad of E.data is False. E.data 的要求 grad 为 False。 Therefore, you can think that an error occurred because of inputs.因此,您可以认为由于输入而发生了错误。 Also, optimizer.zero_grad resets only the gradient of the model and does not reset the gradient of E or inputs.此外,optimizer.zero_grad 仅重置 model 的梯度,不会重置 E 或输入的梯度。

print(E.data.requires_grad) # False
# You want to update only the parameters of the model......
optimizer = torch.optim.Adam(model.parameters())

As I said before, if inputs = r is used, shallow copy occurs, and if inputs = 1 * r is used, deep copy occurs, so the following difference occurs.前面说过,如果使用inputs = r,就会出现浅拷贝,如果使用inputs = 1 * r,就会出现深拷贝,所以出现下面的区别。

  1. In the case of shallow copy, since the inputs equals to r, the gradient just builds up and no error occurs.在浅拷贝的情况下,由于输入等于 r,梯度只是建立起来,没有错误发生。

  2. However, since 1 * r is a calculated value, an error occurs if backward is used several times here.但是由于 1 * r 是计算出来的值,所以这里多次使用backward会出错。

I think it would be good to set r's requires_grad to false.我认为将 r 的 requires_grad 设置为 false 会很好。 If requires_grad is set to True, the value is changed through the gradient.如果 requires_grad 设置为 True,则通过渐变更改该值。 This should only be used for parameters.这应该只用于参数。 However, the input does not need to change its value.但是,输入不需要更改其值。 Check it out with the code below.用下面的代码检查一下。

Code:代码:

# generate data
r = torch.linspace(0.95, 3, 50, requires_grad=False).view(-1, 1)
E = 1 / r
inputs = 1 * r

for epoch in range(10**3):
    E_pred = model.forward(inputs)
    optimizer.zero_grad()
    error = loss(E_pred, E.data)
    error.backward()
    optimizer.step()

print(model.forward(inputs))

If you want only r to set requires grad to true, use the following code如果您只想将 r 设置为 requires grad 为 true,请使用以下代码

# generate data
r = torch.linspace(0.95, 3, 50, requires_grad=True).view(-1, 1)
with torch.no_grad():
  E = 1 / r
  inputs = 1 * r

for epoch in range(10**3):
    E_pred = model.forward(inputs)
    optimizer.zero_grad()
    error = loss(E_pred, E.data)
    error.backward()
    optimizer.step()

print(model.forward(inputs))

When you use inputs = 1 * r , back-propagation tries to calculate grads for inputs variable.当您使用inputs = 1 * r时,反向传播会尝试计算inputs变量的 grads。 Then the problem occurs.然后问题就出现了。 So if you want to calculate the gradients second time then you should use retain_graph=True in backward(retain_graph=True) .因此,如果您想第二次计算梯度,那么您应该在 back backward(retain_graph=True)中使用retain_graph=True Because after calculating the gradient inside the loop the intermediate gradients will be removed but when it goes outside of the loop then it tries to calculate the gradient for inputs but there is no any intermediate gradients, hence the problem will occur.因为在计算循环内的梯度后,中间梯度将被删除,但是当它超出循环时,它会尝试计算inputs的梯度,但没有任何中间梯度,因此会出现问题。

# generate data
r = torch.linspace(0.95, 3, 50, requires_grad=True).view(-1, 1)
E = 1 / r
F = -grad(E.sum(), r)[0]

# with torch.no_grad():
inputs = 1*r

for epoch in range(10**3):
    E_pred = model.forward(inputs)
    F_pred = -grad(E_pred.sum(), r, create_graph=True)[0]

    optimizer.zero_grad()
    error = loss(E_pred, E.data) + loss(F_pred, F.data)
    error.backward(retain_graph=True)
    optimizer.step()

Here is the discussion on this topic. 是关于这个主题的讨论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM