简体   繁体   English

梯度下降和正态方程为多元线性回归给出不同的 theta 值。为什么?

[英]Gradient Descent and Normal Equation give different theta values for multivariate linear regression.Why?

Vectorized implementation of gradient descent梯度下降的矢量化实现

for iter = 1:num_iters

 theta = theta - (alpha / m) * X' * (X * theta - y);   
 J_history(iter) = computeCostMulti(X, y, theta);

end

Implementation of computeCostMulti() computeCostMulti() 的实现

function J = computeCostMulti(X, y, theta)
 m = length(y);
 J = 0;
 J = 1 / (2 * m) * (X * theta - y)' * (X * theta - y);

Normal equation implementation正规方程实现

theta = pinv(X' * X) * X' * y;

These two implementations converge to different values of theta for the same values of X and y.对于相同的 X 和 y 值,这两种实现会收敛到不同的 theta 值。 The Normal Equation gives the right answer but Gradient descent gives a wrong answer.正态方程给出了正确的答案,但梯度下降给出了错误的答案。

Is there anything wrong with the implementation of Gradient Descent?梯度下降的实现有什么问题吗?

I suppose that when you use gradient descent, you first process your input using feature scaling.我想当你使用梯度下降时,你首先使用特征缩放来处理你的输入。 That is not done with the normal equation method (as feature scaling is not required), and that should result in a different theta.这不是用正常方程方法完成的(因为不需要特征缩放),这应该导致不同的 theta。 If you use your models to make predictions they should come up with the same result.如果您使用模型进行预测,它们应该得出相同的结果。

It doesn't matter.没关系。 As you're not making feature scaling to use the normal equation, you'll discover that the prediction is the same由于您没有进行特征缩放以使用正规方程,您会发现预测是相同的

Nobody promised you that gradient descent with fixed step size will converge by num_iters iterations even to a local optimum.没有人向你保证固定步长的梯度下降会通过num_iters次迭代收敛到局部最优值。 You need to iterate until some well defined convergency criteria are met (eg gradient is close to zero).您需要迭代直到满足一些明确定义的收敛标准(例如梯度接近于零)。

If you have normalized the training data before gradient descent, you should also do it with your input data for the prediction.如果您在梯度下降之前对训练数据进行了归一化处理,那么您还应该使用输入数据进行预测。 Concretely, your new input data should be like:具体来说,你的新输入数据应该是这样的:

[1, (x-mu)/sigma]

where:其中:
- 1 is the bias term - 1是偏置项
- mu is the mean of the training data - mu是训练数据的平均值
- sigma is the standard deviation of the training data - sigma是训练数据的标准差

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM