相對於矩陣的Tensorflow梯度

Question

僅僅是為了上下文，我試圖用Tensorflow實現梯度下降算法。

我有一個矩陣X

[ x1 x2 x3 x4 ]
[ x5 x6 x7 x8 ]

我乘以一些特征向量Y得到Z

      [ y1 ]
Z = X [ y2 ]  = [ z1 ]
      [ y3 ]    [ z2 ]
      [ y4 ]

然后我通過softmax函數放Z，並記錄日志。 我將輸出矩陣稱為W.

所有這些都實現如下（添加了一點樣的樣板，因此它可以運行）

sess = tf.Session()
num_features = 4
num_actions = 2

policy_matrix = tf.get_variable("params", (num_actions, num_features))
state_ph = tf.placeholder("float", (num_features, 1))
action_linear = tf.matmul(params, state_ph)
action_probs = tf.nn.softmax(action_linear, axis=0)
action_problogs = tf.log(action_probs)

W（對應於action_problogs ）看起來像

[ w1 ]
[ w2 ]

我想找到相對於矩陣X的w1的梯度 - 也就是說，我想計算

          [ d/dx1 w1 ]
d/dX w1 =      .
               .
          [ d/dx8 w1 ]

（最好看起來像一個矩陣，所以我可以把它添加到X ，但我真的不關心那個）

我希望tf.gradients能做到這一點。 我像這樣計算了“漸變”

problog_gradient = tf.gradients(action_problogs, policy_matrix)

但是，當我檢查problog_gradient ，這就是我得到的

[<tf.Tensor 'foo_4/gradients/foo_4/MatMul_grad/MatMul:0' shape=(2, 4) dtype=float32>]

請注意，這與X具有完全相同的形狀，但它確實不應該。 我希望得到兩個漸變的列表，每個漸變都有8個元素。 我懷疑我會得到兩個漸變，但每個都有四個元素。

我對張力流很陌生，所以我會欣賞和解釋正在發生的事情以及如何實現我想要的行為。

Answer 1

漸變需要標量函數，因此默認情況下，它會對條目求和。 這是默認行為，因為所有梯度下降算法都需要這種類型的功能，並且隨機梯度下降（或其變化）是Tensorflow內部的首選方法。 您將找不到任何更高級的算法（如BFGS或其他東西），因為它們還沒有實現（並且它們需要一個真正的雅可比算法，它還沒有實現）。 值得一提的是，這是我寫的一個正常運作的雅可比實現：

def map(f, x, dtype=None, parallel_iterations=10):
    '''
    Apply f to each of the elements in x using the specified number of parallel iterations.

    Important points:
    1. By "elements in x", we mean that we will be applying f to x[0],...x[tf.shape(x)[0]-1].
    2. The output size of f(x[i]) can be arbitrary. However, if the dtype of that output
       is different than the dtype of x, then you need to specify that as an additional argument.
    '''
    if dtype is None:
        dtype = x.dtype

    n = tf.shape(x)[0]
    loop_vars = [
        tf.constant(0, n.dtype),
        tf.TensorArray(dtype, size=n),
    ]
    _, fx = tf.while_loop(
        lambda j, _: j < n,
        lambda j, result: (j + 1, result.write(j, f(x[j]))),
        loop_vars,
        parallel_iterations=parallel_iterations
    )
    return fx.stack()

def jacobian(fx, x, parallel_iterations=10):
    '''
    Given a tensor fx, which is a function of x, vectorize fx (via tf.reshape(fx, [-1])),
    and then compute the jacobian of each entry of fx with respect to x.
    Specifically, if x has shape (m,n,...,p), and fx has L entries (tf.size(fx)=L), then
    the output will be (L,m,n,...,p), where output[i] will be (m,n,...,p), with each entry denoting the
    gradient of output[i] wrt the corresponding element of x.
    '''
    return map(lambda fxi: tf.gradients(fxi, x)[0],
               tf.reshape(fx, [-1]),
               dtype=x.dtype,
               parallel_iterations=parallel_iterations)

雖然此實現有效，但當您嘗試嵌套時它不起作用。 例如，如果你嘗試使用jacobian( jacobian( ... ))來計算Hessian，那么你會得到一些奇怪的錯誤。 這被追蹤為問題675 。 我仍在等待對此為何會引發錯誤的回應。 我相信在while循環實現或漸變實現中存在一個根深蒂固的錯誤，但我真的不知道。

無論如何，如果您只需要一個雅可比，請嘗試上面的代碼。

Answer 2

tf.gradients實際上對ys求和並計算其漸變，這就是為什么會出現這個問題。

相對於矩陣的Tensorflow梯度

問題描述

2 個解決方案

解決方案1
3 已采納 2018-02-20 07:57:40

解決方案2
0 2018-02-20 07:12:22

相對於矩陣的Tensorflow梯度

問題描述

2 個解決方案

解決方案1 3 已采納 2018-02-20 07:57:40

解決方案2 0 2018-02-20 07:12:22

解決方案1
3 已采納 2018-02-20 07:57:40

解決方案2
0 2018-02-20 07:12:22