I am teaching myself CNTK by going through this tutorial .
There is a function defined as
def linear_layer(input_var, output_dim):
input_dim = input_var.shape[0]
weight_param = C.parameter(shape=(input_dim, output_dim))
bias_param = C.parameter(shape=(output_dim))
mydict['w'], mydict['b'] = weight_param, bias_param
return C.times(input_var, weight_param) + bias_param
In linear algebra, when multiplying 2 matrices together, it only makes sense when the inner dimensions match. More explicitly, the 2nd dimension of the left side, must equal the 1st dimension of the right side.
In this function, the 2nd (right) parameter of the 'times' function is 'weight_param', which has it's first dimension set to be the same as the first dimension of the first (left) parameter. This should not work because the inner dimensions do not match.
I am not sure whether input_var is (nx 2) or (2 xn) but either way should cause an error. If it is (nx 2) then weight_param is (nx 2) and (nx 2)*(nx 2) should not compute. If it is (2 xn) then weight_param is (2 x 2) and (2 xn)*(2 x 2) should not compute.
But somehow this runs just fine. Even more confusing, is if I change the final line of the function to
return C.times_transpose(input_var, weight_param) + bias_param
it produces exaclty the same results.
So what is going on here? Why does this work? Why does the transpose version produce the same results?
PS: I am teaching myself python at the same time as cntk so these observations may be explained by something about python. Let me know.
input_var
is (nx 2), where n represents batch axis. If you print input_var
, you will get something like Input('Input7', [#], [2])
, #
means batch axis.
input_var.shape[0]
returns 2, so the first dimension of the weight_param
is 2 ( weight_param
doesn't have batch axis because it's a parameter, not a variable).
So inner dimensions match.
C.times_transpose
produces the same results because output_dim
also happens to be 2 in the example.
the cntk tutorial that you are following is just explaining how to construct a logistic regression model, feed it with data and do prediction.The code is not so exhaustive (you may find a variable "feature" and "features" which is not ok etc...) you should always print the shapes of your variables and tensors to make it clear, for times operator, in short terms, it does the following:
times(A,B) computes A*B
times.transpose_transpose(A,B) computes (A^T)*B , inner product
You have to be always sure of shapes, it is very important to make things clear
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.