简体   繁体   中英

Understanding tensor ranks and behaviour

I am starting out with tensorflow and I have a huge problem when it comes to the ranks of tensors and how they interact with each other.

I have the following code with me:

w = tf.Variable(tf.constant([0.2,0.6]))  
x = tf.placeholder(tf.float32)  
y =  w * x  

As you can see, it is an incredibly simple setup.
However, when I execute the print w the output is Tensor("Variable_13/read:0", shape=(2,), dtype=float32) .
What is the meaning of shape(2,) ? What does the comma indicate?

Further, here are the other sore points after sess = tf.Session() and initialising the variables:

print(sess.run(y,{x:[1,2]})) 

[ 0.2 1.20000005]

print(sess.run(y,{x:[1]}))  

[ 0.2 0.60000002]

print(sess.run(y,{x:[[1],[2]]})) 

[[ 0.2 0.60000002]
[ 0.40000001 1.20000005]]

Why am I getting such a variety of behaviour? How is tensorflow determining a single data point? I realise now that specifying shape while declaring the placeholder is probably better than getting myself stuck like this.
I understand the last two cases as they were taught in class, but I am at a loss to explain the behaviour of the first case.

Your first question is a simple one. shape=(2,) refers to the dimensions of w . In numpy , shape is always represented by a tuple of integers, like this:

>>> x = np.random.randn(50)
>>> x.shape
(50,)

This is a 1D array, and only one integer is specified in the shape . Now,...

>>> x = np.random.randn(50, 50)
>>> x.shape
(50, 50)

This is a 2D array. As you can see, shape specifies the size of x along 2 dimensions.


To answer your second question, x is a placeholder, meaning it can take up any value you give it. That is precisely what the following lines do: {x:[1,2]} , {x:[1]} and {x:[[1],[2]]}

In the first case, x is assigned a 1D array of 2 elements [1, 2] . In the second case, a 1D array with 1 element [1] and so on.

Now, the operation w * x above specifies that w should be multiplied with x . So, when doing sess.run(y,{x:[1,2]}) , w is multiplied by x with the values passed to it. And the output you see changes depending on the value you pass to x .

In the first case, [0.2, 0.6] * [1, 2] just multiplies each element at their corresponding indices and the result is [0.2 * 1, 0.6 * 2] .

The second case does something similar.

In the third case, we have x with dimensions (2, 1). So each row of x is in turn multiplied with w to get a separate row, giving [[ 0.2, 0.60000002], [ 0.40000001, 1.20000005]] as your output.

First question

shape(2,) indicates the shape of the tensor. In particular, the comma at the end indicates that the tensor is tuple.

You can check this simply running:

type((2))

that returns int , whilst

type((2,))

returns tuple .

Second question

You just discovered the broadcasting .

In short, in the first case, you're multiplying singularly each element of the 2 input tensors.

In the second case, you're multiplying a tensor for a scalar.

In the third case, instead, you're multiplying each element of w for each element of x . This because x has shape (something, 1) . The 1 in the last dimension "trigger" a broacasting rule that makes the operation to behave in such way.

You should read the brodacasting rules here for a better understanding: https://docs.scipy.org/doc/numpy-1.12.0/user/basics.broadcasting.html#general-broadcasting-rules

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM