為什么tf.matmul（a，b，transpose_b = True）有效，但不是tf.matmul（a，tf.transpose（b））？

Question

碼：

x = tf.constant([1.,2.,3.], shape = (3,2,4))
y = tf.constant([1.,2.,3.], shape = (3,21,4))
tf.matmul(x,y)                     # Doesn't work. 
tf.matmul(x,y,transpose_b = True)  # This works. Shape is (3,2,21)
tf.matmul(x,tf.transpose(y))       # Doesn't work.

我想知道y在tf.matmul(x,y,transpose_b = True)里面變成了什么樣的形狀tf.matmul(x,y,transpose_b = True)所以我可以注意到LSTM里面真正發生的事情。

Answer 1

對於秩> 2的張量，可以不同地定義轉置，並且這里的差異在於由tf.transpose和tf.matmul(..., transpose_b=True)轉置的軸。

默認情況下， tf.transpose執行此操作：

返回的張量的維度i將對應於輸入維度perm[i] 。 如果沒有給出perm，則將其設置為(n-1...0) ，其中n是輸入張量的等級。 因此，默認情況下，此操作在2-D輸入張量上執行常規矩陣轉置。

所以在你的情況下，它會將y轉換為一個形狀的張量(4, 21, 3) 4,21,3 (4, 21, 3) ，這與x 不兼容 （見下文）。

但是如果設置perm=[0, 2, 1] ，結果是兼容的 ：

# Works! (3, 2, 4) * (3, 4, 21) -> (3, 2, 21).
tf.matmul(x, tf.transpose(y, [0, 2, 1]))

關於`tf.matmul`

您可以計算點積： (a, b, c) * (a, c, d) -> (a, b, d) 。 但它不是張量點產品 - 它是批量操作 （見這個問題）。

在這種情況下， a被認為是批量大小，因此tf.matmul計算矩陣(b, c) * (c, d) a點積。

批處理可以是多個維度，因此這也是有效的：

(a, b, c, d) * (a, b, d, e) -> (a, b, c, e)

為什么tf.matmul（a，b，transpose_b = True）有效，但不是tf.matmul（a，tf.transpose（b））？

問題描述

1 個解決方案

解決方案1
2 2018-01-05 12:23:41

關於`tf.matmul`

為什么tf.matmul（a，b，transpose_b = True）有效，但不是tf.matmul（a，tf.transpose（b））？

問題描述

1 個解決方案

解決方案1 2 2018-01-05 12:23:41

關於tf.matmul

解決方案1
2 2018-01-05 12:23:41

關於`tf.matmul`