简体   繁体   English

tensorflow 中的 tf.matmul(X,weight) 与 tf.matmul(X,tf.traspose(weight))

[英]tf.matmul(X,weight) vs tf.matmul(X,tf.traspose(weight)) in tensorflow

In standard ANN for fully connected layers we are using the following formula: tf.matmul(X,weight) + bias .在全连接层的标准 ANN 中,我们使用以下公式: tf.matmul(X,weight) + bias Which is clear to me, as we use matrix multiplication in order to connect input with th hidden layer.我很清楚,因为我们使用矩阵乘法来连接输入和隐藏层。

But in GloVe implementation( https://nlp.stanford.edu/projects/glove/ ) we are using the following formula for embeddings multiplication: tf.matmul(W, tf.transpose(U)) what confuses me is tf.transpose(U) part.但在 GloVe 实现( https://nlp.stanford.edu/projects/glove/ )中,我们使用以下公式进行嵌入乘法: tf.matmul(W, tf.transpose(U))让我感到困惑的是tf.transpose(U)部分。 Why do we use tf.matmul(W, tf.transpose(U)) instead of tf.matmul(W, U) ?为什么我们使用tf.matmul(W, tf.transpose(U))而不是tf.matmul(W, U)

It has to do with the choice of column vs row orientation for the vectors.它与向量的列方向与行方向的选择有关。

Note that weight is the second parameter here:请注意, weight是这里的第二个参数:

tf.matmul(X, weight)

But the first parameter, W , here:但是第一个参数W在这里:

tf.matmul(W, tf.transpose(U))

So what you are seeing is a practical application of the following matrix transpose identity:因此,您看到的是以下矩阵转置恒等式的实际应用:

矩阵乘法转置恒等式


To bring it back to your example, let's assume 10 inputs and 20 outputs.回到您的示例,让我们假设 10 个输入和 20 个输出。

The first approach uses row vectors.第一种方法使用行向量。 A single input X would be a 1x10 matrix, called a row vector because it has a single row.单个输入X将是一个1x10矩阵,称为行向量,因为它只有一行。 To match, the weight matrix needs to be 10x20 to produce an output of size 20 .为了匹配, weight矩阵需要为10x20才能生成大小为20的 output 。

But in the second approach the multiplication is reversed.但是在第二种方法中,乘法是相反的。 That is a hint that everything is using column vectors.这暗示一切都在使用列向量。 If the multiplication is reversed, then everything gets a transpose.如果乘法反转,那么一切都会转置。 So this example is using column vectors, so named because they have a single column.所以这个例子使用了列向量,之所以这样命名是因为它们只有一列。

That's why the transpose is there.这就是转置存在的原因。 The way they GLoVe authors have done their notation, with the multiplication reversed, the weight matrix W must already be transposed to 20x10 instead of 10x20 .他们 GLoVe 的作者完成他们的符号的10x20 ,乘法反转,权重矩阵W必须已经转置为 20x10 而不是20x10 And they must be expecting a 20x1 column vector for the output.他们必须期待 output 的20x1列向量。

So if the input vector U is naturally a 1x10 row vector, it also has to be transposed, to a 10x1 column vector, to fit in with everything else.因此,如果输入向量U自然是1x10行向量,它也必须转置为10x1列向量,以适应其他所有内容。


Basically you should pick row vectors or column vectors, all the time, and then the order of multiplications and the transposition of the weights is determined for you.基本上你应该一直选择行向量或列向量,然后为你确定乘法的顺序和权重的转置。

Personally I think that column vectors, as used by GloVe, are awkward and unnatural compared to row vectors.我个人认为 GloVe 使用的列向量与行向量相比是笨拙且不自然的。 It's better to have the multiplication ordering follow the data flow ordering.最好让乘法排序遵循数据流排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM