简体   繁体   English

MNIST张量流教程对matmul翻转技巧有何意义?

[英]What does the MNIST tensorflow tutorial mean with matmul flipping trick?

The tutorial on MNIST for ML Beginners, in Implementing the Regression , shows how to make the regression on a single line, followed by an explanation that mentions the use of a trick (emphasis mine): 实现回归》中有关MNIST的ML入门教程,展示了如何在单行上进行回归,然后进行解释,其中提到了技巧的使用(强调我的意思):

y = tf.nn.softmax(tf.matmul(x, W) + b)

First, we multiply x by W with the expression tf.matmul(x, W) . 首先,我们用表达式tf.matmul(x, W)将x乘以tf.matmul(x, W) This is flipped from when we multiplied them in our equation, where we had Wx, as a small trick to deal with x being a 2D tensor with multiple inputs. 这是从我们在方程中乘以Wx时得到的乘积翻转过来的,这是处理x为具有多个输入的2D张量的一个小技巧

What is the trick here, and why are we using it? 这有什么窍门,为什么我们要使用它?

Well, there's no trick here. 好吧,这里没有把戏。 That line basically points to one previous equation multiplication order 那条线基本上指向一个先前的方程式乘法顺序

# Here the order of W and x, this equation for single example
y = Wx +b
# if you want to use batch of examples you need the change the order of multiplication; instead of using another transpose op
y = xW +b
# hence
y = tf.matmul(x, W)

Ok, I think the main point is that if you train in batches (ie train with several instances of the training set at once), TensorFlow always assumes that the zeroth dimension of x indicates the number of events per batch. 好的,我认为要点是,如果您分批训练(即一次训练多个训练集实例),TensorFlow始终假设x的第零维表示每个批次的事件数。

Suppose you want to map a training instance of dimension M to a target instance of dimension N. You would typically do this by multiplying x (a column vector) with a NxM matrix (and, optionally, add a bias with dimension N (also a column vector)), ie 假设您想将维度M的训练实例映射到维度N的目标实例。通常可以通过将x(列向量)与NxM矩阵相乘(并可选地向维度N添加偏差(也可以是a)列向量)),即

y = W*x + b, where y is also a column vector. y = W * x + b,其中y也是列向量。

This is perfectly alright seen from the perspective of linear algebra. 从线性代数的角度来看,这是完全可以的。 But now comes the point with the training in batches, ie training with several training instances at once. 但是现在分批培训就成为了重点,即一次培训多个培训实例。 To get to understand this, it might be helpful to not view x (and y) as vectors of dimension M (and N), but as matrices with the dimensions Mx1 (and Nx1 for y). 为了理解这一点,最好不要将x(和y)视为维度M(和N)的向量,而是将其视为维度为Mx1(y为Nx1)的矩阵。 Since TensorFlow assumes that the different training instances constituting a batch are aligned along the zeroth dimension, we get into trouble here since the zeroth dimension is occupied by the different elements of one single instance. 由于TensorFlow假定组成批处理的不同训练实例沿第零维度对齐,因此这里遇到麻烦,因为第零维度被一个实例的不同元素占据。 The trick is then to transpose the above equation (remember that transposition of a product also switches the order of the two transposed objects): 然后,诀窍是转置上述等式(请记住,乘积的转置也会切换两个转置对象的顺序):

y^T = x^T * W^T + b^T y ^ T = x ^ T * W ^ T + b ^ T

This is pretty much what has been described in short within the tutorial. 这几乎是本教程中简短描述的内容。 Note that y^T is now a matrix of dimension 1xN (practically a row vector), while x^T is a matrix of dimension 1xM (also a row vector). 请注意,现在y ^ T是尺寸为1xN的矩阵(实际上是行向量),而x ^ T是尺寸为1xM的矩阵(也是行向量)。 W^T is a matrix of dimension MxN. W ^ T是维度MxN的矩阵。 In the tutorial, they did not write x^T or y^T, but simply defined the placeholders according to this transposed equation. 在本教程中,他们没有写x ^ T或y ^ T,而只是根据此转置方程式定义占位符。 The only point that is not clear to me is why they did not define b the "transposed way". 我唯一不清楚的一点是,为什么他们没有定义b “换位方式”。 I assume that the + operator automatically transposes b if it is necessary in order to get the correct dimensions. 我假设+运算符会在必要时自动转置b以获得正确的尺寸。

The rest is now pretty easy: if you have batches larger than 1 instance, you just "stack" multiple of the x (1xM) matrices, say to a matrix of dimensions (AxM) (where A is the batch size). 现在,其余的工作非常简单:如果批处理大于1个实例,则只需“堆叠”多个x (1xM)矩阵,比如说是一个尺寸为(AxM)的矩阵(其中A为批处理大小)。 b will hopefully automatically broadcasted to this number of events (that means to a matrix of dimension (AxN). If you then use b希望会自动广播到该数量的事件(即到维度矩阵(AxN)。如果您使用

y^T = x^T * W^T + b^T, y ^ T = x ^ T * W ^ T + b ^ T,

you will get a (AxN) matrix of the targets for each element of the batch. 您将获得批次中每个元素的目标矩阵(AxN)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM