简体   繁体   English

在TensorFlow中没有广播tf.matmul

[英]No broadcasting for tf.matmul in TensorFlow

I have a problem with which I've been struggling. 我有一个问题,我一直在努力。 It is related to tf.matmul() and its absence of broadcasting. 它与tf.matmul()及其缺少广播有关。

I am aware of a similar issue on https://github.com/tensorflow/tensorflow/issues/216 , but tf.batch_matmul() doesn't look like a solution for my case. 我在https://github.com/tensorflow/tensorflow/issues/216上发现了类似的问题,但是tf.batch_matmul()看起来不像我的情况的解决方案。

I need to encode my input data as a 4D tensor: X = tf.placeholder(tf.float32, shape=(None, None, None, 100)) The first dimension is the size of a batch, the second the number of entries in the batch. 我需要将输入数据编码为4D张量: X = tf.placeholder(tf.float32, shape=(None, None, None, 100))第一个维度是批量的大小,第二个维度是条目的数量在批次中。 You can imagine each entry as a composition of a number of objects (third dimension). 您可以将每个条目想象为多个对象的组合(第三维)。 Finally, each object is described by a vector of 100 float values. 最后,每个对象由100个浮点值的向量描述。

Note that I used None for the second and third dimensions because the actual sizes may change in each batch. 请注意,我对第二维和第三维使用了None,因为实际大小可能会在每个批次中发生变化。 However, for simplicity, let's shape the tensor with actual numbers: X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100)) 但是,为简单起见,让我们用实际数字来形成张量: X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))

These are the steps of my computation: 这些是我计算的步骤:

  1. compute a function of each vector of 100 float values (eg, linear function) W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1)) Y = tf.matmul(X, W) problem : no broadcasting for tf.matmul() and no success using tf.batch_matmul() expected shape of Y: (5, 10, 4, 50) 计算100个浮点值的每个向量的函数(例如,线性函数) W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1)) Y = tf.matmul(X, W) 问题 :否广播用于tf.matmul()并使用没有成功tf.batch_matmul()预期的Y形状:(5,10,4,50)

  2. applying average pooling for each entry of the batch (over the objects of each entry): Y_avg = tf.reduce_mean(Y, 2) expected shape of Y_avg: (5, 10, 50) 对批次的每个条目应用平均池(在每个条目的对象上): Y_avg = tf.reduce_mean(Y, 2) Y_avg的预期形状:(5,10,50)

I expected that tf.matmul() would have supported broadcasting. 我预计tf.matmul()会支持广播。 Then I found tf.batch_matmul() , but still it looks like doesn't apply to my case (eg, W needs to have 3 dimensions at least, not clear why). 然后我找到了tf.batch_matmul() ,但它看起来仍然不适用于我的情况(例如,W需要至少有3个维度,不清楚为什么)。

BTW, above I used a simple linear function (the weights of which are stored in W). 顺便说一句,上面我使用了一个简单的线性函数(其权重存储在W中)。 But in my model I have a deep network instead. 但在我的模型中,我有一个深层网络。 So, the more general problem I have is automatically computing a function for each slice of a tensor. 因此,我遇到的更普遍的问题是自动计算张量的每个切片的函数。 This is why I expected that tf.matmul() would have had a broadcasting behavior (if so, maybe tf.batch_matmul() wouldn't even be necessary). 这就是为什么我预期tf.matmul()会有广播行为(如果是这样,可能甚至不需要tf.batch_matmul() )。

Look forward to learning from you! 期待向您学习! Alessio 阿莱西奥

You could achieve that by reshaping X to shape [n, d] , where d is the dimensionality of one single "instance" of computation (100 in your example) and n is the number of those instances in your multi-dimensional object ( 5*10*4=200 in your example). 您可以通过将X重塑为shape [n, d]来实现这一点,其中d是计算的单个“实例”的维度(在您的示例中为100), n是多维对象中这些实例的数量( 5*10*4=200你的例子)。 After reshaping, you can use tf.matmul and then reshape back to the desired shape. 重塑后,您可以使用tf.matmul ,然后重塑为所需的形状。 The fact that the first three dimensions can vary makes that little tricky, but you can use tf.shape to determine the actual shapes during run time. 前三个维度可以变化的事实使得这一点很棘手,但您可以使用tf.shape来确定运行时的实际形状。 Finally, you can perform the second step of your computation, which should be a simple tf.reduce_mean over the respective dimension. 最后,您可以执行计算的第二步,该步骤应该是相应维度上的简单tf.reduce_mean All in all, it would look like this: 总而言之,它看起来像这样:

X = tf.placeholder(tf.float32, shape=(None, None, None, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
X_ = tf.reshape(X, [-1, 100])
Y_ = tf.matmul(X_, W)
X_shape = tf.gather(tf.shape(X), [0,1,2]) # Extract the first three dimensions
target_shape = tf.concat(0, [X_shape, [50]])
Y = tf.reshape(Y_, target_shape)
Y_avg = tf.reduce_mean(Y, 2)

As the renamed title of the GitHub issue you linked suggests, you should use tf.tensordot() . 作为您链接建议的GitHub问题的重命名标题,您应该使用tf.tensordot() It enables contraction of axes pairs between two tensors, in line with Numpy's tensordot() . 它可以实现两个张量之间的轴对的收缩,符合Numpy的tensordot() For your case: 对于你的情况:

X = tf.placeholder(tf.float32, shape=(5, 10, 4, 100))
W = tf.Variable(tf.truncated_normal([100, 50], stddev=0.1))
Y = tf.tensordot(X, W, [[3], [0]])  # gives shape=[5, 10, 4, 50]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM