简体   繁体   中英

Gradient Descent and Normal Equation give different theta values for multivariate linear regression.Why?

Vectorized implementation of gradient descent

for iter = 1:num_iters

 theta = theta - (alpha / m) * X' * (X * theta - y);   
 J_history(iter) = computeCostMulti(X, y, theta);

end

Implementation of computeCostMulti()

function J = computeCostMulti(X, y, theta)
 m = length(y);
 J = 0;
 J = 1 / (2 * m) * (X * theta - y)' * (X * theta - y);

Normal equation implementation

theta = pinv(X' * X) * X' * y;

These two implementations converge to different values of theta for the same values of X and y. The Normal Equation gives the right answer but Gradient descent gives a wrong answer.

Is there anything wrong with the implementation of Gradient Descent?

I suppose that when you use gradient descent, you first process your input using feature scaling. That is not done with the normal equation method (as feature scaling is not required), and that should result in a different theta. If you use your models to make predictions they should come up with the same result.

It doesn't matter. As you're not making feature scaling to use the normal equation, you'll discover that the prediction is the same

Nobody promised you that gradient descent with fixed step size will converge by num_iters iterations even to a local optimum. You need to iterate until some well defined convergency criteria are met (eg gradient is close to zero).

If you have normalized the training data before gradient descent, you should also do it with your input data for the prediction. Concretely, your new input data should be like:

[1, (x-mu)/sigma]

where:
- 1 is the bias term
- mu is the mean of the training data
- sigma is the standard deviation of the training data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM