为什么 Pytorch autograd 需要标量？

Question

I am working through " Deep Learning for Coders with fastai & Pytorch ".我正在研究“使用 fastai 和 Pytorch 为编码人员进行深度学习”。 Chapter 4 introduces the autograd function from the PyTorch library on a trivial example.第 4 章通过一个简单的例子介绍了 PyTorch 库中的 autograd 函数。

x = tensor([3.,4.,10.]).requires_grad_()
def f(q): return sum(q**2)
y = f(x)
y.backward()

My question boils down to this: the result of y = f(x) is tensor(125., grad_fn=AddBackward0) , but what does that even mean?我的问题归结为： y = f(x)是tensor(125., grad_fn=AddBackward0) ，但这到底是什么意思？ Why would I sum the values of three completely different inputs?为什么我要将三个完全不同的输入值相加？

I get that using .backward() in this case is shorthand for .backward(tensor[1.,1.,1.]) in this scenario, but I don't see how summing 3 unrelated numbers in a list helps get the gradient for anything.我知道在这种情况下使用.backward()是.backward(tensor[1.,1.,1.])在这种情况下的简写，但我不明白在列表中对 3 个不相关的数字求和如何帮助得到任何东西的渐变。 What am I not understanding?我不明白什么？

_{I'm not looking for a grad-level explanation here.}_{我不是在这里寻找研究生级别的解释。} _{The subtitle for the book I'm using is AI Applications Without a Ph.D.}_{我正在使用的这本书的副标题是没有博士学位的人工智能应用程序。} _{My experience with gradients is from school is that I should be getting a FUNCTION back, but I understand that isn't the case with Autograd.}_{我在学校的渐变经验是我应该恢复一个功能，但我知道 Autograd 不是这种情况。} _{A graph of this short example would be helpful, but the ones I see online usually include too many parameters or weights and biases to be useful, my mind gets lost in the paths.}_{这个简短示例的图表会有所帮助，但我在网上看到的那些通常包含太多参数或权重和偏差而无用，我的思绪迷失在路径中。}

Answer 1

TLDR; TLDR； the derivative of a sum of functions is the sum of their derivatives函数和的导数是它们的导数之和

Let x be your input vector made of x_i (where i in [0,n] ), y = x**2 and L = sum(y_i) .让x是由x_i （其中i在[0,n] ）、 y = x**2和L = sum(y_i)组成的输入向量。 You are looking to compute dL/dx , a vector of the same size as x whose components are the dL/dx_j (where j in [0,n] ).您要计算dL/dx ，这是一个与x大小相同的向量，其分量是dL/dx_j （其中j在[0,n] ）。

For j in [0,n] , dL/dx_j is simply dy_j/dx_j (derivative of the sum is the sum of derivates and only one of them is different to zero), which is d(x_j**2)/dx_j , ie 2*x_j .对于[0,n] j ， dL/dx_j只是dy_j/dx_j （总和的导数是导数的总和，其中只有一个不为零），即d(x_j**2)/dx_j ，即2*x_j 。 Therefore, dL/dx = [2*x_j where j in [0,n]] .因此， dL/dx = [2*x_j where j in [0,n]] 。

This is the result you get in x.grad when either computing the gradient of x as:这是计算x的梯度时在x.grad得到的结果：

y = f(x)
y.backward()

or the gradient of each components of x separately :或x的每个分量的梯度分别为：

y = x**2
y.backward(torch.ones_like(x))

为什么 Pytorch autograd 需要标量？

问题描述

1 个解决方案

解决方案1
4 已采纳 2021-07-26 21:39:14

为什么 Pytorch autograd 需要标量？

问题描述

1 个解决方案

解决方案1 4 已采纳 2021-07-26 21:39:14

解决方案1
4 已采纳 2021-07-26 21:39:14