简体   繁体   English

为什么 theta*X 实际上不是 theta'*X?

[英]Why theta*X not theta'*X in practical?

While doing MOOC on ML by Andrew Ng, he in theory explains theta'*X gives us hypothesis and while doing coursework we use theta*X .在 Andrew Ng 对 ML 进行 MOOC 时,他在理论上解释了theta'*X给了我们假设,而在做课程时我们使用theta*X Why it's so?为什么会这样?

theta'*X is used to calculate the hypothesis for a single training example when X is a vector . theta'*X用于计算X 为向量单个训练示例的假设。 Then you have to calculate theta' to get to the h(x) definition.然后你必须计算theta'以获得 h(x) 定义。

In the practice, since you have more than one training example , X is a Matrix ( your training set ) with "mxn" dimension where m is the number of your training examples and n your number of features .在实践中,由于您有多个训练示例,因此X 是一个矩阵您的训练集),其维度为“mxn”,其中m 是您的训练示例的数量,n 是您的特征数量

Now, you want to calculate h(x) for all your training examples with your theta parameter in just one move right?现在,您想用您的 theta 参数一步计算所有训练示例的 h(x) 对吗?

Here is the trick : theta has to be anx 1 vector then when you do Matrix-Vector Multiplication (X*theta) you will obtain an mx 1 vector with all your h(x)'s training examples in your training set (X matrix).这是诀窍theta 必须是 anx 1 向量,然后当您进行矩阵向量乘法 (X*theta) 时,您将获得一个 mx 1 向量,其中包含训练集(X 矩阵)中所有 h(x) 的训练示例)。 Matrix multiplication will create the vector h(x) row by row making the corresponding math and this will be equal to the h(x) definition at each training example.矩阵乘法将逐行创建向量 h(x) 进行相应的数学运算,这将等于每个训练示例中的 h(x) 定义。

You can do the math by hand, I did it and now is clear.你可以手工计算,我已经做到了,现在很清楚了。 Hope i can help someone.希望我能帮助别人。 :) :)

In mathematics , a 'vector' is always defined as a vertically-stacked array, eg数学中,“向量”始终定义为垂直堆叠的数组,例如, and signifies a single point in a 3-dimensional space. ,并且表示在一个3维空间中的单个点。

A 'horizontal' vector, typically signifies an array of observations, eg “水平”向量,通常表示一系列观察,例如is a tuple of 3 scalar observations.是 3 个标量观测值的元组。

Equally, a matrix can be thought of as a collection of vectors.同样,矩阵可以被认为是向量的集合。 Eg, the following is a collection of four 3-dimensional vectors:例如,以下是四个 3 维向量的集合:

A scalar can be thought of as a matrix of size 1x1, and therefore its transpose is the same as the original.标量可以被认为是一个大小为 1x1 的矩阵,因此它的转置与原始矩阵相同。

More generally, an n-by-m matrix W can also be thought of as a transformation from an m-dimensional vector x to an n-dimensional vector y , since multiplying that matrix with an m-dimensional vector will yield a new n-dimensional one.更一般地说,n×m 矩阵W也可以被认为是从 m 维向量x到 n 维向量y变换,因为将该矩阵与 m 维向量相乘将产生一个新的 n-维度一。 If your 'matrix' W is '1xn', then this denotes a transformation from an n-dimensional vector to a scalar.如果您的“矩阵” W是“1xn”,则这表示从 n 维向量到标量的转换。

Therefore, notationally, it is customary to introduce the problem from the mathematical notation point of view, eg y = Wx .因此,在符号上,习惯上从数学符号的角度来介绍问题,例如y = Wx

However, for computational reasons, sometimes it makes more sense to perform the calculation as a "vector times a matrix" rather than "matrix times a vector".但是,出于计算原因,有时将计算作为“向量乘以矩阵”而不是“矩阵乘以向量”来执行更有意义。 Since (Wx)' === x'W' , sometimes we solve the problem like that, and treat x' as a horizontal vector.由于(Wx)' === x'W' ,有时我们会这样解决问题,并将x'视为水平向量。 Also, if W is not a matrix, but a scalar, then Wx denotes scalar multiplication, and therefore in this case Wx === xW .此外,如果W不是矩阵,而是标量,则Wx表示标量乘法,因此在这种情况下Wx === xW

I don't know the exercises you speak of, but my assumption would be that in the course he introduced theta as a proper, vertical vector, but then transposed it to perform proper calculations, ie a transformation from a vector of n-dimensions to a scalar (which is your prediction).我不知道你说的练习,但我的假设是在课程中他引入了theta作为一个适当的垂直向量,但随后将其转置以执行适当的计算,即从 n 维向量到标量(这是您的预测)。

Then in the exercises, presumably you were either dealing with a scalar 'theta' so there was no point transposing it, and was left as theta for convenience or , theta was now defined as a horizontal (ie transposed) vector to begin with for some reason (eg printing convenience), and then was left in that state when performing the necessary transformation.然后在练习,想必你要么标“THETA”,所以没有点处理移调了,就留给THETA为了方便THETA是现在被定义为水平(即移位的)向量与开始一段原因(例如打印方便),然后在执行必要的转换时保持该状态。

I don't know what the dimensions for your theta and X are (you haven't provided anything) but actually it all depends on the X , theta and hypothesis dimensions.我不知道你的thetaX的维度是什么(你没有提供任何东西)但实际上这一切都取决于Xtheta和假设维度。 Let's say m is the number of features and n - the number of examples.假设m是特征数, n是示例数。 Then, if theta is a mx1 vector and X is a nxm matrix then X*theta is a nx1 hypothesis vector.然后,如果theta是一个mx1向量并且X是一个nxm矩阵,那么X*theta是一个nx1假设向量。

But you will get the same result if calculate theta'*X . 但是如果计算 theta'*X您将得到相同的结果。 You can also get the same result with theta*X if theta is 1xm and X - mxn 如果 theta1xmX - mxn您也可以使用 theta*X获得相同的结果

Edit:编辑:

As @Tasos Papastylianou pointed out the same result will be obtained if X is mxn then (theta.'*X).'正如@Tasos Papastylianou 指出的那样,如果Xmxn然后(theta.'*X).'将获得相同的结果(theta.'*X).' or X.'*theta are answers.X.'*theta是答案。 If the hypothesis should be a 1xn vector then theta.'*X is an answer.如果假设应该是一个1xn向量,那么theta.'*X就是一个答案。 If theta is 1xm , X - mxn and the hypothesis is 1xn then theta*X is also a correct answer.如果theta1xmX - mxn并且假设是1xn那么theta*X也是正确答案。

i had the same problem for me.我有同样的问题。 (ML course, linear regression) after spending time on it, here is how i see it: there is a confusion between the x(i) vector and the X matrix. (ML 课程,线性回归)在花时间研究之后,这是我的看法:x(i) 向量和 X 矩阵之间存在混淆。

About the hypothesis h(xi) for a xi vector (xi belongs to R3x1), theta belongs to R3x1 theta = [to;t1;t2] #R(3x1) theta' = [to t1 t2] #R(1x3) xi = [1 ; xi1 ; xi2] #(R3x1) theta' * xi => to + t1.xi,1 +t2.xi,2关于 xi 向量(xi 属于 R3x1)的假设 h(xi),theta 属于 R3x1 theta = [to;t1;t2] #R(3x1) theta' = [to t1 t2] #R(1x3) xi = [1 ; xi1 ; xi2] #(R3x1) theta' * xi => to + t1.xi,1 +t2.xi,2 theta = [to;t1;t2] #R(3x1) theta' = [to t1 t2] #R(1x3) xi = [1 ; xi1 ; xi2] #(R3x1) theta' * xi => to + t1.xi,1 +t2.xi,2

= h(xi) (which is a R1x1 => a real number) = h(xi)(这是一个 R1x1 => 实数)

to the theta'*xi works here到 theta'*xi 在这里工作

About the vectorization equation in this case X is not the same thing as x (vector).关于这种情况下的矢量化方程 X 与 x(矢量)不同。 it is a matrix with m rows and n+1 col (m =number of examples and n number of features on which we add the to term)它是一个具有 m 行和 n+1 列的矩阵(m = 示例的数量和我们在其上添加 to 项的 n 个特征)

therefore from the previous example with n= 2, the matrix X is amx 3 matrix X = [1 xo,1 xo,2 ;因此,从前面 n= 2 的示例中,矩阵 X 是 amx 3 矩阵 X = [1 xo,1 xo,2 ; 1 x1,1 x1,2 ; 1 x1,1 x1,2 ; .... ; ....; 1 xi,1 xi,2 ; 1 xi,1 xi,2 ; ...; ...; 1 xm,1 xm,2] 1 xm,1 xm,2]

if you want to vectorize the equation for the algorithm, you need to consider for that for each row i, you will have the h(xi) (a real number) so you need to implement X * theta如果你想对算法的方程进行矢量化,你需要考虑每一行 i,你将有 h(xi)(一个实数),所以你需要实现 X * theta

that will give you for each row i [ 1 xi,1 xi,2] * [to ; t1 ; t2] = to + t1.xi,1 + t2.xi,2这将为您提供每一行 i [ 1 xi,1 xi,2] * [to ; t1 ; t2] = to + t1.xi,1 + t2.xi,2 [ 1 xi,1 xi,2] * [to ; t1 ; t2] = to + t1.xi,1 + t2.xi,2

Hope it helps希望能帮助到你

I have used octave notation and syntax for writing matrices: 'comma' for separating column items, 'semicolon' for separating row items and 'single quote' for Transpose.我已经使用八度符号和语法来编写矩阵:“逗号”用于分隔列项目,“分号”用于分隔行项目,“单引号”用于转置。

In the course theory under discussion, theta = [theta 0 ;在所讨论的课程理论中theta = [theta 0 ; theta 1 ; θ 1 ; theta 2 ; θ 2 ; theta 3 ; θ 3 ; .... theta f ]. .... θ f ]。

'theta' is therefore a column vector or '(f+1) x 1' matrix.因此,'theta' 是一个列向量或 '(f+1) x 1' 矩阵。 Here 'f' is the number of features.这里的“f”是特征的数量。 theta 0 is the intercept term. theta 0是截距项。

With just one training example, x is a '(f+1) x 1' matrix or a column vector.仅在一个训练示例中,x 是一个 '(f+1) x 1' 矩阵或列向量。 Specifically x = [x 0 ;特别是x = [x 0 ; x 1 ; × 1 ; x 2 ; × 2 ; x 3 ; × 3 ; .... x f ] x 0 is always '1'. .... x f ] x 0总是“1”。

In this special case the '1 x (f+1)' matrix formed by taking theta ' and x could be multiplied to give the correct '1x1' hypothesis matrix or a real number.在这种特殊情况下,通过取 theta '和 x 形成的 '1 x (f+1)' 矩阵可以相乘以给出正确的 '1x1' 假设矩阵或实数。

h = theta' * x is a valid expression. h = theta' * x是一个有效的表达式。

But the coursework deals with multiple training examples.课程作业涉及多个训练示例。 If there are 'm' training examples, X is a 'mx (f+1)' matrix.如果有 'm' 个训练示例,则X是一个 'mx (f+1)' 矩阵。

To simplify, let there be two training examples each with 'f' features.为简化起见,假设有两个训练示例,每个示例都具有“f”特征。

X = [ x 1 ; X = [ x 1 ; x 2 ]. × 2 ]。

(Please note 1 and 2 inside the brackets are not exponential terms but indexes for the training examples). (请注意括号内的 1 和 2 不是指数项,而是训练示例的索引)。

Here, x 1 = [ x 0 1 , x 1 1 , x 2 1 , x 3 1 , .... x f 1 ] and x 2 = [ x 0 2 , x 1 2 , x 2 2 , x 3 2 , .... x f 2 ].这里,x 1 = [ x 0 1 , x 1 1 , x 2 1 , x 3 1 , .... x f 1 ] 和 x 2 = [ x 0 2 , x 1 2 , x 2 2 , x 3 2 , .... x f 2 ]。

So X is a '2 x (f+1)' matrix.所以 X 是一个 '2 x (f+1)' 矩阵。

Now to answer the question, theta ' is a '1 x (f+1)' matrix and X is a '2 x (f+1)' matrix.现在回答这个问题,theta '是一个 '1 x (f+1)' 矩阵, X 是一个 '2 x (f+1)' 矩阵。 With this, the following expressions are not valid.因此,以下表达式无效。

  1. theta' * X
  2. theta * X

The expected hypothesis matrix, ' h ', should have two predicted values (two real numbers), one for each of the two training examples.预期假设矩阵“ h ”应该有两个预测值(两个实数),两个训练示例中的每一个都有一个。 ' h ' is a '2 x 1' matrix or column vector. ' h ' 是一个 '2 x 1' 矩阵或列向量。

The hypothesis can be obtained only by using the expression, X * theta which is valid and algebraically correct.该假设只能通过使用有效且代数正确的表达式X * theta来获得。 Multiplying a '2 x (f+1)' matrix with a '(f+1) x 1' matrix resulting in a '2 x 1' hypothesis matrix.将一个 '2 x (f+1)' 矩阵与一个 '(f+1) x 1' 矩阵相乘得到一个 '2 x 1' 假设矩阵。

This is because the computer has the coordinate (0,0) positioned on the top left, while geometry has the coordinate (0,0) positioned on the bottom left.这是因为计算机的坐标 (0,0) 位于左上角,而几何体的坐标 (0,0) 位于左下角。

enter image description here在此处输入图片说明

When Andrew Ng first introduced x in the cost function J(theta), x is a column vector aka当 Andrew Ng 首次在成本函数 J(theta) 中引入 x 时,x 是一个列向量,又名

[x0; x1; ... ; xn]

i.e. 

x0;
x1;
...;
xn

However, in the first programming assignment, we are given X, which is an (m * n) matrix, (# training examples * features per training example).然而,在第一个编程任务中,我们得到了 X,它是一个 (m * n) 矩阵,(# 个训练示例 * 每个训练示例的特征)。 The discrepancy comes with the fact that from file the individual x vectors(training samples) are stored as horizontal row vectors rather than the vertical column vectors!!差异来自这样一个事实,即从文件中,单个 x 向量(训练样本)存储为水平行向量而不是垂直列向量!!

This means the X matrix you see is actually an X' (X Transpose) matrix!!这意味着你看到的 X 矩阵实际上是一个 X'(X 转置)矩阵!!

Since we have X', we need to make our code work given our equation is looking for h(theta) = theta' * X(when the vectors in matrix X are column vectors)由于我们有 X',我们需要让我们的代码工作,因为我们的方程正在寻找 h(theta) = theta' * X(当矩阵 X 中的向量是列向量时)

we have the linear algebra identity for matrix and vector multiplication:我们有矩阵和向量乘法的线性代数恒等式:

(A*B)' == (B') * (A') as shown here Properties of Transposes (A*B)' == (B') * (A')如此处所示转置的属性

let t = theta,
given, h(t) = t' * X
h(t)' = (t' X)'
= X' * t

Now we have our variables in the format they were actually given to us.现在我们的变量采用了实际提供给我们的格式。 What I mean is that our input file really holds X' and theta is normal, so multiplying them in the order specified above will give a practically equivilant output to that he taught us to use which was theta' * X. Since we are summing all the elements of h(t)' at the end it doesn't matter that it is transposed for the final calculation.我的意思是我们的输入文件确实包含 X' 并且 theta 是正常的,因此按照上面指定的顺序将它们相乘将给出与他教我们使用 which is theta' * X 的输出实际上等效。因为我们正在总结所有h(t)' 的元素在最后它被转置以用于最终计算并不重要。 However, if you wanted h(t), not h(t)', you could always take your computed result and transpose it because但是,如果你想要 h(t),而不是 h(t)',你总是可以取你的计算结果并转置它,因为

(A')' == A

However, for the coursera machine learning programming assignment 1, this is unnecessary.然而,对于coursera机器学习编程作业1,这是不必要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用输入法写出sigmoid函数(X * theta) - Writing sigmoid function with input as (X * theta) 在协同过滤的梯度下降中,x 和 theta 是否同时更新? - In gradient descent for collaborative filtering, are x and theta update simultaneous? 难以理解权重矩阵或 theta 矩阵的维度是 3X4 吗? - Trouble in understanding how the dimension of the weight matrix or theta matrix is 3X4? 为什么对回归执行正则化时跳过theta0? - Why is theta0 skipped while performing regulariztion on regression? 正则化,同时避免theta(1) - Regularization while avoiding theta(1) 同时更新 theta0 和 theta1 以计算 Python 中的梯度下降 - simultaneously update theta0 and theta1 to calculate gradient descent in python 如何通过替换theta 0和theta 1的值使用假设函数在图形上进行绘制 - how to plot on a graph using the hypothesis function by substituting the value of theta 0 and theta 1 梯度下降不会更新theta值 - Gradient descent not updating theta values 梯度下降和正态方程为多元线性回归给出不同的 theta 值。为什么? - Gradient Descent and Normal Equation give different theta values for multivariate linear regression.Why? 如何使用fmin_ncg计算成本和theta - How to calculate cost and theta with fmin_ncg
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM