Gradient Descent and Normal Equation give different theta values for multivariate linear regression.Why?

Question

Vectorized implementation of gradient descent

for iter = 1:num_iters

 theta = theta - (alpha / m) * X' * (X * theta - y);   
 J_history(iter) = computeCostMulti(X, y, theta);

end

Implementation of computeCostMulti()

function J = computeCostMulti(X, y, theta)
 m = length(y);
 J = 0;
 J = 1 / (2 * m) * (X * theta - y)' * (X * theta - y);

Normal equation implementation

theta = pinv(X' * X) * X' * y;

These two implementations converge to different values of theta for the same values of X and y. The Normal Equation gives the right answer but Gradient descent gives a wrong answer.

Is there anything wrong with the implementation of Gradient Descent?

Answer 1

I suppose that when you use gradient descent, you first process your input using feature scaling. That is not done with the normal equation method (as feature scaling is not required), and that should result in a different theta. If you use your models to make predictions they should come up with the same result.

Answer 2

It doesn't matter. As you're not making feature scaling to use the normal equation, you'll discover that the prediction is the same

Answer 3

Nobody promised you that gradient descent with fixed step size will converge by num_iters iterations even to a local optimum. You need to iterate until some well defined convergency criteria are met (eg gradient is close to zero).

Answer 4

If you have normalized the training data before gradient descent, you should also do it with your input data for the prediction. Concretely, your new input data should be like:

[1, (x-mu)/sigma]

where:
- 1 is the bias term
- mu is the mean of the training data
- sigma is the standard deviation of the training data

Gradient Descent and Normal Equation give different theta values for multivariate linear regression.Why?

Question

Vectorized implementation of gradient descent

Implementation of computeCostMulti()

Normal equation implementation

4 answers

solution1
3 2017-11-02 07:23:25

solution2
2 2019-01-04 19:12:13

solution3
0 2017-11-02 13:12:46

solution4
0 2020-05-22 09:54:47

Gradient Descent and Normal Equation give different theta values for multivariate linear regression.Why?

Question

Vectorized implementation of gradient descent

Implementation of computeCostMulti()

Normal equation implementation

4 answers

solution1 3 2017-11-02 07:23:25

solution2 2 2019-01-04 19:12:13

solution3 0 2017-11-02 13:12:46

solution4 0 2020-05-22 09:54:47

solution1
3 2017-11-02 07:23:25

solution2
2 2019-01-04 19:12:13

solution3
0 2017-11-02 13:12:46

solution4
0 2020-05-22 09:54:47