简体   繁体   English

如何通过张量有效地使用 PyTorch 的 autograd?

[英]How to use PyTorch's autograd efficiently with tensors?

In my previous question I found how to use PyTorch's autograd to differentiate.在我之前的问题中,我发现了如何使用 PyTorch 的 autograd 来区分。 And it worked:它奏效了:

#autograd
import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim

class net_x(nn.Module): 
        def __init__(self):
            super(net_x, self).__init__()
            self.fc1=nn.Linear(1, 20) 
            self.fc2=nn.Linear(20, 20)
            self.out=nn.Linear(20, 4) 

        def forward(self, x):
            x=torch.tanh(self.fc1(x))
            x=torch.tanh(self.fc2(x))
            x=self.out(x)
            return x

nx = net_x()
r = torch.tensor([1.0], requires_grad=True)
print('r', r)
y = nx(r)
print('y', y)
print('')
for i in range(y.shape[0]):
    # prints the vector (dy_i/dr_0, dy_i/dr_1, ... dy_i/dr_n)
    print(grad(y[i], r, retain_graph=True))

>>>
r tensor([1.], requires_grad=True)
y tensor([ 0.1698, -0.1871, -0.1313, -0.2747], grad_fn=<AddBackward0>)

(tensor([-0.0124]),)
(tensor([-0.0952]),)
(tensor([-0.0433]),)
(tensor([-0.0099]),)

The problem that I currently have is that I have to differentiate a very large tensor and iterating through it like I'm currently doing ( for i in range(y.shape[0]) ) is taking forever.我目前遇到的问题是我必须区分一个非常大的张量并像我目前正在做的那样迭代它( for i in range(y.shape[0]) )需要永远。 The reason I'm iterating is that from understanding, grad only knows how to propagate gradients from a scalar tensor, which y is not.我进行迭代的原因是,根据理解, grad只知道如何从标量张量传播梯度,而y不是。 So I need to compute the gradients with respect to each coordinate of y .所以我需要计算关于y的每个坐标的梯度。
I know that TensorFlow is capable of differentiating tensors, from here :我知道 TensorFlow 能够从这里区分张量:

tf.gradients(
    ys, xs, grad_ys=None, name='gradients', gate_gradients=False,
    aggregation_method=None, stop_gradients=None,
    unconnected_gradients=tf.UnconnectedGradients.NONE
)
"ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor, holding the gradients received by the ys. The list must be the same length as ys.

gradients() adds ops to the graph to output the derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys and for x in xs."

And was hoping that there's a more efficient way to differentiate tensors in PyTorch.并希望有一种更有效的方法来区分 PyTorch 中的张量。

For example:例如:

a = range(100)
b = range(100)
c = range(100)
d = range(100)
my_tensor = torch.tensor([a,b,c,d])

t = range(100)

#derivative = grad(my_tensor, t) --> not working

#Instead what I'm currently doing:
for i in range(len(t)):
    a_grad = grad(a[i],t[i], retain_graph=True)
    b_grad = grad(b[i],t[i], retain_graph=True)
    #etc.

I was told that it might work if I could run autograd on the forward pass rather than the backwards pass, but from here it seems like it's not currently a feature PyTorch has.有人告诉我,如果我可以在向前传递而不是向后传递上运行 autograd,它可能会起作用,但从这里看来,它目前似乎不是 PyTorch 具有的功能。

Update 1:更新1:
@jodag mentioned that what I'm looking for might be just the diagonal of the Jacobian. @jodag 提到我正在寻找的可能只是雅可比的对角线。 I'm following the link he attached and trying out the faster method.我正在关注他附加的链接并尝试更快的方法。 Though, this doesn't seem to work and gives me an error: RuntimeError: grad can be implicitly created only for scalar outputs .虽然,这似乎不起作用并给我一个错误: RuntimeError: grad can be implicitly created only for scalar outputs Code:代码:

nx = net_x()
x = torch.rand(10, requires_grad=True)
x = torch.reshape(x, (10,1))
x = x.unsqueeze(1).repeat(1, 4, 1)
y = nx(x)
dx = torch.diagonal(torch.autograd.grad(torch.diagonal(y, 0, -2, -1), x), 0, -2, -1)

I believe I solved it using @ jodag advice -- to simply calculate the Jacobian and take the diagonal.我相信我使用@jodag 建议解决了它——简单地计算雅可比并取对角线。
Consider the following network:考虑以下网络:

import torch
from torch.autograd import grad
import torch.nn as nn
import torch.optim as optim

class net_x(nn.Module): 
        def __init__(self):
            super(net_x, self).__init__()
            self.fc1=nn.Linear(1, 20) 
            self.fc2=nn.Linear(20, 20)
            self.out=nn.Linear(20, 4) #a,b,c,d

        def forward(self, x):
            x=torch.tanh(self.fc1(x))
            x=torch.tanh(self.fc2(x))
            x=self.out(x)
            return x

nx = net_x()

#input
t = torch.tensor([1.0, 2.0, 3.2], requires_grad = True) #input vector
t = torch.reshape(t, (3,1)) #reshape for batch

My approach so far was to iterate through the input since grad wants a scalar value as mentioned above:到目前为止,我的方法是遍历输入,因为grad想要一个标量值,如上所述:

#method 1
for timestep in t:
    y = nx(timestep) 
    print(grad(y[0],timestep, retain_graph=True)) #0 for the first vector (i.e "a"), 1 for the 2nd vector (i.e "b")

>>>
(tensor([-0.0142]),)
(tensor([-0.0517]),)
(tensor([-0.0634]),)

Using the diagonal of the Jacobian seems more efficient and gives the same results:使用雅可比行列式的对角线似乎更有效,并给出相同的结果:

#method 2
dx = torch.autograd.functional.jacobian(lambda t_: nx(t_), t)
dx = torch.diagonal(torch.diagonal(dx, 0, -1), 0)[0] #first vector
#dx = torch.diagonal(torch.diagonal(dx, 1, -1), 0)[0] #2nd vector
#dx = torch.diagonal(torch.diagonal(dx, 2, -1), 0)[0] #3rd vector
#dx = torch.diagonal(torch.diagonal(dx, 3, -1), 0)[0] #4th vector
dx

>>>
tensor([-0.0142, -0.0517, -0.0634])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM