简体   繁体   English

关于矩阵行的theano梯度

[英]theano gradient with respect to matrix row

As the question suggests, I would like to compute the gradient with respect to a matrix row. 正如问题所暗示的,我想计算相对于矩阵行的梯度。 In code: 在代码中:

import numpy.random as rng
import theano.tensor as T
from theano import function

t_x = T.matrix('X')
t_w = T.matrix('W')
t_y = T.dot(t_x, t_w.T)

t_g = T.grad(t_y[0,0], t_x[0])   # my wish, but DisconnectedInputError
t_g = T.grad(t_y[0,0], t_x)      # no problems, but a lot of unnecessary zeros

f = function([t_x, t_w], [t_y, t_g])
y,g = f(rng.randn(2,5), rng.randn(7,5))

As the comments indicate, the code works without any problems when I compute the gradient with respect to the entire matrix. 如注释所示,当我计算整个矩阵的梯度时,代码可以正常工作。 In this case the gradient is correctly computed, but the problem is that the result has only non-zero entries in row 0 (because other rows of x obviously do not appear in the equations for the first row of y). 在这种情况下,可以正确计算梯度,但是问题是结果在行0中只有非零的条目(因为x的其他行显然不会出现在y的第一行的方程式中)。

I have found this question , suggesting to store all rows of the matrix in separate variables and build graphs from these variables. 我发现了这个问题 ,建议将矩阵的所有行存储在单独的变量中,并根据这些变量构建图形。 In my setting though, I have no idea how much rows might be in X . 但是在我的设置中,我不知道X可能有多少行。

Would anybody have an idea how to get the gradient with respect to a single row of a matrix or how I could omit the extra zeros in the output? 会有人知道如何相对于矩阵的单行获取梯度,或者我如何在输出中忽略多余的零? If anybody would have suggestions how an arbitrary amount of vectors can be stacked, that should work as well, I guess. 我想,如果有人对如何堆叠任意数量的向量有任何建议,那也应该起作用。

I realised that it is possible to get rid of the zeros when computing derivatives with respect to the entries in row i : 我意识到,针对第i行中的条目计算导数时,有可能摆脱零:

t_g = T.grad(t_y[i,0], t_x)[i]

and for computing the Jacobian, I found out that 为了计算雅可比矩阵,我发现

t_g = T.jacobian(t_y[i], t_x)[:,i]

does the trick. 绝招。 However it seems to have a rather heavy impact on computation speed. 但是,这似乎对计算速度有相当大的影响。


It would also be possible to approach this problem mathematically. 数学上也可以解决这个问题。 The Jacobian of the matrix multiplication t_y wrt t_x is simply the transpose of t_w.T , which is t_w in this case (the transpose of the transpose is the original matrix). 矩阵乘法的雅可比t_y WRT t_x是简单的转置t_w.T ,这是t_w在这种情况下(转置的转置是原来的矩阵)。 Thus, the computation would be as simple as 因此,计算将像

t_g = t_w

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM