关于矩阵行的theano梯度

Question

As the question suggests, I would like to compute the gradient with respect to a matrix row. 正如问题所暗示的，我想计算相对于矩阵行的梯度。 In code: 在代码中：

import numpy.random as rng
import theano.tensor as T
from theano import function

t_x = T.matrix('X')
t_w = T.matrix('W')
t_y = T.dot(t_x, t_w.T)

t_g = T.grad(t_y[0,0], t_x[0])   # my wish, but DisconnectedInputError
t_g = T.grad(t_y[0,0], t_x)      # no problems, but a lot of unnecessary zeros

f = function([t_x, t_w], [t_y, t_g])
y,g = f(rng.randn(2,5), rng.randn(7,5))

As the comments indicate, the code works without any problems when I compute the gradient with respect to the entire matrix. 如注释所示，当我计算整个矩阵的梯度时，代码可以正常工作。 In this case the gradient is correctly computed, but the problem is that the result has only non-zero entries in row 0 (because other rows of x obviously do not appear in the equations for the first row of y). 在这种情况下，可以正确计算梯度，但是问题是结果在行0中只有非零的条目（因为x的其他行显然不会出现在y的第一行的方程式中）。

I have found this question , suggesting to store all rows of the matrix in separate variables and build graphs from these variables. 我发现了这个问题，建议将矩阵的所有行存储在单独的变量中，并根据这些变量构建图形。 In my setting though, I have no idea how much rows might be in X . 但是在我的设置中，我不知道X可能有多少行。

Would anybody have an idea how to get the gradient with respect to a single row of a matrix or how I could omit the extra zeros in the output? 会有人知道如何相对于矩阵的单行获取梯度，或者我如何在输出中忽略多余的零？ If anybody would have suggestions how an arbitrary amount of vectors can be stacked, that should work as well, I guess. 我想，如果有人对如何堆叠任意数量的向量有任何建议，那也应该起作用。

Answer 1

I realised that it is possible to get rid of the zeros when computing derivatives with respect to the entries in row i : 我意识到，针对第i行中的条目计算导数时，有可能摆脱零：

t_g = T.grad(t_y[i,0], t_x)[i]

and for computing the Jacobian, I found out that 为了计算雅可比矩阵，我发现

t_g = T.jacobian(t_y[i], t_x)[:,i]

does the trick. 绝招。 However it seems to have a rather heavy impact on computation speed. 但是，这似乎对计算速度有相当大的影响。

It would also be possible to approach this problem mathematically. 数学上也可以解决这个问题。 The Jacobian of the matrix multiplication t_y wrt t_x is simply the transpose of t_w.T , which is t_w in this case (the transpose of the transpose is the original matrix). 矩阵乘法的雅可比t_y WRT t_x是简单的转置t_w.T ，这是t_w在这种情况下（转置的转置是原来的矩阵）。 Thus, the computation would be as simple as 因此，计算将像

t_g = t_w

关于矩阵行的theano梯度

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-05-19 07:13:43

关于矩阵行的theano梯度

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-05-19 07:13:43

解决方案1
0 已采纳 2017-05-19 07:13:43