简体   繁体   中英

PyTorch Autograd for Regression

another PyTorch newbie here trying to understand their computational graph and autograd.

I'm learning the following model on potential energy and corresponding force.

model = nn.Sequential(
    nn.Linear(1, 32),
    nn.Linear(32, 32), nn.Tanh(),
    nn.Linear(32, 32), nn.Tanh(),
    nn.Linear(32, 1)
)

optimizer = torch.optim.Adam(model.parameters())
loss = nn.MSELoss()
# generate data
r = torch.linspace(0.95, 3, 50, requires_grad=True).view(-1, 1)
E = 1 / r
F = -grad(E.sum(), r)[0]

inputs = r

for epoch in range(10**3):
    E_pred = model.forward(inputs)
    F_pred = -grad(E_pred.sum(), r, create_graph=True, retain_graph=True)[0]

    optimizer.zero_grad()
    error = loss(E_pred, E.data) + loss(F_pred, F.data)
    error.backward()
    optimizer.step()

However, if I change the inputs = r to inputs = 1*r , the training loop breaks and gives the following error

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Could you please explain why this happens?

This error occurs when backward is executed after backward. Here is the example code.

output = model.forward(x)
loss = criterion(label, output)

optimizer.zero_grad()
loss.backward()

loss2 = criterion(loss, output2)
loss2.backward()

optimizer.step()

And as you can see in the following code, if you just put r in inputs, a shallow copy occurs. Therefore, when the value of r changes, the value of inputs also changes. However, if multiplied by 1, it becomes a deep copy and the value does not change even if r is changed.

r = torch.linspace(0.95, 3, 50).view(-1, 1)

inputs_1 = r
inputs_2 = 1 * r
r[0] = 100

print(inputs_1)
print(inputs_2)

And the requires grad of E.data is False. Therefore, you can think that an error occurred because of inputs. Also, optimizer.zero_grad resets only the gradient of the model and does not reset the gradient of E or inputs.

print(E.data.requires_grad) # False
# You want to update only the parameters of the model......
optimizer = torch.optim.Adam(model.parameters())

As I said before, if inputs = r is used, shallow copy occurs, and if inputs = 1 * r is used, deep copy occurs, so the following difference occurs.

  1. In the case of shallow copy, since the inputs equals to r, the gradient just builds up and no error occurs.

  2. However, since 1 * r is a calculated value, an error occurs if backward is used several times here.

I think it would be good to set r's requires_grad to false. If requires_grad is set to True, the value is changed through the gradient. This should only be used for parameters. However, the input does not need to change its value. Check it out with the code below.

Code:

# generate data
r = torch.linspace(0.95, 3, 50, requires_grad=False).view(-1, 1)
E = 1 / r
inputs = 1 * r

for epoch in range(10**3):
    E_pred = model.forward(inputs)
    optimizer.zero_grad()
    error = loss(E_pred, E.data)
    error.backward()
    optimizer.step()

print(model.forward(inputs))

If you want only r to set requires grad to true, use the following code

# generate data
r = torch.linspace(0.95, 3, 50, requires_grad=True).view(-1, 1)
with torch.no_grad():
  E = 1 / r
  inputs = 1 * r

for epoch in range(10**3):
    E_pred = model.forward(inputs)
    optimizer.zero_grad()
    error = loss(E_pred, E.data)
    error.backward()
    optimizer.step()

print(model.forward(inputs))

When you use inputs = 1 * r , back-propagation tries to calculate grads for inputs variable. Then the problem occurs. So if you want to calculate the gradients second time then you should use retain_graph=True in backward(retain_graph=True) . Because after calculating the gradient inside the loop the intermediate gradients will be removed but when it goes outside of the loop then it tries to calculate the gradient for inputs but there is no any intermediate gradients, hence the problem will occur.

# generate data
r = torch.linspace(0.95, 3, 50, requires_grad=True).view(-1, 1)
E = 1 / r
F = -grad(E.sum(), r)[0]

# with torch.no_grad():
inputs = 1*r

for epoch in range(10**3):
    E_pred = model.forward(inputs)
    F_pred = -grad(E_pred.sum(), r, create_graph=True)[0]

    optimizer.zero_grad()
    error = loss(E_pred, E.data) + loss(F_pred, F.data)
    error.backward(retain_graph=True)
    optimizer.step()

Here is the discussion on this topic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM