如何通过张量有效地使用 PyTorch 的 autograd？

Question

在我之前的问题中，我发现了如何使用 PyTorch 的 autograd 来区分。 它奏效了：

#autograd
import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim

class net_x(nn.Module): 
        def __init__(self):
            super(net_x, self).__init__()
            self.fc1=nn.Linear(1, 20) 
            self.fc2=nn.Linear(20, 20)
            self.out=nn.Linear(20, 4) 

        def forward(self, x):
            x=torch.tanh(self.fc1(x))
            x=torch.tanh(self.fc2(x))
            x=self.out(x)
            return x

nx = net_x()
r = torch.tensor([1.0], requires_grad=True)
print('r', r)
y = nx(r)
print('y', y)
print('')
for i in range(y.shape[0]):
    # prints the vector (dy_i/dr_0, dy_i/dr_1, ... dy_i/dr_n)
    print(grad(y[i], r, retain_graph=True))

>>>
r tensor([1.], requires_grad=True)
y tensor([ 0.1698, -0.1871, -0.1313, -0.2747], grad_fn=<AddBackward0>)

(tensor([-0.0124]),)
(tensor([-0.0952]),)
(tensor([-0.0433]),)
(tensor([-0.0099]),)

我目前遇到的问题是我必须区分一个非常大的张量并像我目前正在做的那样迭代它（ for i in range(y.shape[0]) ）需要永远。 我进行迭代的原因是，根据理解， grad只知道如何从标量张量传播梯度，而y不是。 所以我需要计算关于y的每个坐标的梯度。
我知道 TensorFlow 能够从这里区分张量：

tf.gradients(
    ys, xs, grad_ys=None, name='gradients', gate_gradients=False,
    aggregation_method=None, stop_gradients=None,
    unconnected_gradients=tf.UnconnectedGradients.NONE
)
"ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor, holding the gradients received by the ys. The list must be the same length as ys.

gradients() adds ops to the graph to output the derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys and for x in xs."

并希望有一种更有效的方法来区分 PyTorch 中的张量。

例如：

a = range(100)
b = range(100)
c = range(100)
d = range(100)
my_tensor = torch.tensor([a,b,c,d])

t = range(100)

#derivative = grad(my_tensor, t) --> not working

#Instead what I'm currently doing:
for i in range(len(t)):
    a_grad = grad(a[i],t[i], retain_graph=True)
    b_grad = grad(b[i],t[i], retain_graph=True)
    #etc.

有人告诉我，如果我可以在向前传递而不是向后传递上运行 autograd，它可能会起作用，但从这里看来，它目前似乎不是 PyTorch 具有的功能。

更新1：
@jodag 提到我正在寻找的可能只是雅可比的对角线。 我正在关注他附加的链接并尝试更快的方法。 虽然，这似乎不起作用并给我一个错误： RuntimeError: grad can be implicitly created only for scalar outputs 。 代码：

nx = net_x()
x = torch.rand(10, requires_grad=True)
x = torch.reshape(x, (10,1))
x = x.unsqueeze(1).repeat(1, 4, 1)
y = nx(x)
dx = torch.diagonal(torch.autograd.grad(torch.diagonal(y, 0, -2, -1), x), 0, -2, -1)

Answer 1

我相信我使用@jodag 建议解决了它——简单地计算雅可比并取对角线。
考虑以下网络：

import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim

class net_x(nn.Module): 
        def __init__(self):
            super(net_x, self).__init__()
            self.fc1=nn.Linear(1, 20) 
            self.fc2=nn.Linear(20, 20)
            self.out=nn.Linear(20, 4) #a,b,c,d

        def forward(self, x):
            x=torch.tanh(self.fc1(x))
            x=torch.tanh(self.fc2(x))
            x=self.out(x)
            return x

nx = net_x()

#input
t = torch.tensor([1.0, 2.0, 3.2], requires_grad = True) #input vector
t = torch.reshape(t, (3,1)) #reshape for batch

到目前为止，我的方法是遍历输入，因为grad想要一个标量值，如上所述：

#method 1
for timestep in t:
    y = nx(timestep) 
    print(grad(y[0],timestep, retain_graph=True)) #0 for the first vector (i.e "a"), 1 for the 2nd vector (i.e "b")

>>>
(tensor([-0.0142]),)
(tensor([-0.0517]),)
(tensor([-0.0634]),)

使用雅可比行列式的对角线似乎更有效，并给出相同的结果：

#method 2
dx = torch.autograd.functional.jacobian(lambda t_: nx(t_), t)
dx = torch.diagonal(torch.diagonal(dx, 0, -1), 0)[0] #first vector
#dx = torch.diagonal(torch.diagonal(dx, 1, -1), 0)[0] #2nd vector
#dx = torch.diagonal(torch.diagonal(dx, 2, -1), 0)[0] #3rd vector
#dx = torch.diagonal(torch.diagonal(dx, 3, -1), 0)[0] #4th vector
dx

>>>
tensor([-0.0142, -0.0517, -0.0634])

如何通过张量有效地使用 PyTorch 的 autograd？

问题描述

1 个解决方案

解决方案1
1 2021-04-30 13:30:46

如何通过张量有效地使用 PyTorch 的 autograd？

问题描述

1 个解决方案

解决方案1 1 2021-04-30 13:30:46

解决方案1
1 2021-04-30 13:30:46