Regularization while avoiding theta(1)

Question

I am completing week 3 of the Standford Machine Learning course on Coursera, and the submission program gave me a feedback telling me that theta(1) should not be normalized.

I tried to simply compute J from theta(2) as follows:

J = (1/m) * sum((-y .* log(h)) - ((1-y) .* log(1-h))) + (lambda/(2*m)) *   sum(theta(2:end) .^ 2);

grad = 1/m * sum((h - y) .* X) + lambda .* theta ./ m;

which did not work. I eventually found a similar program online and changed my code to

J = (1/m) * sum((-y .* log(h)) - ((1-y) .* log(1-h))) + (lambda/(2*m)) * sum(theta(2:end) .^ 2);

grad = ((h - y)' * X / m)' + lambda .* theta .* [0; ones(length(theta)-1, 1)] ./ m;

which worked, but I don't understand the purpose of the matrix [0; ones(length(theta)-1, 1)] [0; ones(length(theta)-1, 1)] in the code. Can someone explain it to me?

Answer 1

I have also completed this course long back. I can't really answer without looking at the equation but to the best of my knowledge, theta1 (+1) is a bias term that we add and there is no point of regularizing or penalizing that bias terms. That equation (element-wise multiplication) is a way to put theta1 = 0 by multiplying with 0 and rest of theta's remains as it is by multiplying them with 1 while computing the gradient descent. so in the end, we won't be considering theta1 in computation. Hope this helps.

Regularization while avoiding theta(1)

Question

1 answers

solution1
0 2019-08-31 19:02:05

Regularization while avoiding theta(1)

Question

1 answers

solution1 0 2019-08-31 19:02:05

solution1
0 2019-08-31 19:02:05