简体   繁体   中英

How exactly does the CNTK “times” function work?

I am teaching myself CNTK by going through this tutorial .

There is a function defined as

def linear_layer(input_var, output_dim):

    input_dim = input_var.shape[0]
    weight_param = C.parameter(shape=(input_dim, output_dim))
    bias_param = C.parameter(shape=(output_dim))

    mydict['w'], mydict['b'] = weight_param, bias_param

    return C.times(input_var, weight_param) + bias_param

In linear algebra, when multiplying 2 matrices together, it only makes sense when the inner dimensions match. More explicitly, the 2nd dimension of the left side, must equal the 1st dimension of the right side.

In this function, the 2nd (right) parameter of the 'times' function is 'weight_param', which has it's first dimension set to be the same as the first dimension of the first (left) parameter. This should not work because the inner dimensions do not match.

I am not sure whether input_var is (nx 2) or (2 xn) but either way should cause an error. If it is (nx 2) then weight_param is (nx 2) and (nx 2)*(nx 2) should not compute. If it is (2 xn) then weight_param is (2 x 2) and (2 xn)*(2 x 2) should not compute.

But somehow this runs just fine. Even more confusing, is if I change the final line of the function to

return C.times_transpose(input_var, weight_param) + bias_param

it produces exaclty the same results.

So what is going on here? Why does this work? Why does the transpose version produce the same results?

PS: I am teaching myself python at the same time as cntk so these observations may be explained by something about python. Let me know.

input_var is (nx 2), where n represents batch axis. If you print input_var , you will get something like Input('Input7', [#], [2]) , # means batch axis.

input_var.shape[0] returns 2, so the first dimension of the weight_param is 2 ( weight_param doesn't have batch axis because it's a parameter, not a variable).

So inner dimensions match.

C.times_transpose produces the same results because output_dim also happens to be 2 in the example.

the cntk tutorial that you are following is just explaining how to construct a logistic regression model, feed it with data and do prediction.The code is not so exhaustive (you may find a variable "feature" and "features" which is not ok etc...) you should always print the shapes of your variables and tensors to make it clear, for times operator, in short terms, it does the following:

times(A,B) computes A*B

times.transpose_transpose(A,B) computes (A^T)*B , inner product

You have to be always sure of shapes, it is very important to make things clear

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM