简体   繁体   中英

How to matmul a 2d tensor with a 3d tensor in tensorflow?

In numpy you can multiply a 2d array with a 3d array as below example:

>>> X = np.random.randn(3,5,4) # [3,5,4]
... W = np.random.randn(5,5) # [5,5]
... out = np.matmul(W, X) # [3,5,4]

from my understanding, np.matmul() takes W and broadcast it along the first dimension of X . But in tensorflow it is not allowed:

>>> _X = tf.constant(X)
... _W = tf.constant(W)
... _out = tf.matmul(_W, _X)

ValueError: Shape must be rank 2 but is rank 3 for 'MatMul_1' (op: 'MatMul') with input shapes: [5,5], [3,5,4].

So is there a equivalent for what np.matmul() does above in tensorflow ? And what's the best practice in tensorflow for multiplying 2d tensor with 3d tensor?

Try using tf.tile to match the dimension of the matrix before multiplication. The automatic broadcast feature of numpy doesnt seem to be implemented in tensorflow. You have to do it manually.

W_T = tf.tile(tf.expand_dims(W,0),[3,1,1])

This should do the trick

import numpy as np
import tensorflow as tf

X = np.random.randn(3,4,5)
W = np.random.randn(5,5)

_X = tf.constant(X)
_W = tf.constant(W)
_W_t = tf.tile(tf.expand_dims(_W,0),[3,1,1])

with tf.Session() as sess:
    print(sess.run(tf.matmul(_X,_W_t)))

您可以改用tensordot

tf.transpose(tf.tensordot(_W, _X, axes=[[1],[1]]),[1,0,2])

Following is from tensorflow XLA broadcasting semantics

The XLA language is as strict and explicit as possible, avoiding implicit and "magical" features. Such features may make some computations slightly easier to define, at the cost of more assumptions baked into user code that will be difficult to change in the long term.

So Tensorflow doesn't offers built in broadcasting feature.

However it does offer something that can reshape a tensor just like it was broadcasted. This operation is called tf.tile

Signature is as follow :

tf.tile(input, multiples, name=None)

This operation creates a new tensor by replicating input multiples times. The output tensor's i'th dimension has input.dims(i) * multiples[i] elements, and the values of input are replicated multiples[i] times along the 'i'th dimension.

You can also use tf.einsum to avoid tiling the tensor:

tf.einsum("ab,ibc->iac", _W, _X)

A full example:

import numpy as np
import tensorflow as tf

# Numpy-style matrix multiplication:
X = np.random.randn(3,5,4)
W = np.random.randn(5,5)
np_WX = np.matmul(W, X)

# TensorFlow-style multiplication:
_X = tf.constant(X)
_W = tf.constant(W)
_WX = tf.einsum("ab,ibc->iac", _W, _X)

with tf.Session() as sess:
    tf_WX = sess.run(_WX)

# Check that the results are the same:
print(np.allclose(np_WX, tf_WX))

Here I'll use keras backend K.dot and tensorflow tf.transpose . first swap inner dim of 3 D tensor

X=tf.transpose(X,perm=[0,-1,1]) # X shape=[3,4,5]

now multiply like so

out=K.dot(X,W) # out shape=[3,4,5]

and now swap axes again

out = tf.transpose(out,perm=[0,-1,1]) # out shape=[3,5,4]

Above solution saves memory at little cost of time because you are not tiling W .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM