简体   繁体   中英

Regularization while avoiding theta(1)

I am completing week 3 of the Standford Machine Learning course on Coursera, and the submission program gave me a feedback telling me that theta(1) should not be normalized.

I tried to simply compute J from theta(2) as follows:

J = (1/m) * sum((-y .* log(h)) - ((1-y) .* log(1-h))) + (lambda/(2*m)) *   sum(theta(2:end) .^ 2);

grad = 1/m * sum((h - y) .* X) + lambda .* theta ./ m;

which did not work. I eventually found a similar program online and changed my code to

J = (1/m) * sum((-y .* log(h)) - ((1-y) .* log(1-h))) + (lambda/(2*m)) * sum(theta(2:end) .^ 2);

grad = ((h - y)' * X / m)' + lambda .* theta .* [0; ones(length(theta)-1, 1)] ./ m;

which worked, but I don't understand the purpose of the matrix [0; ones(length(theta)-1, 1)] [0; ones(length(theta)-1, 1)] in the code. Can someone explain it to me?

I have also completed this course long back. I can't really answer without looking at the equation but to the best of my knowledge, theta1 (+1) is a bias term that we add and there is no point of regularizing or penalizing that bias terms. That equation (element-wise multiplication) is a way to put theta1 = 0 by multiplying with 0 and rest of theta's remains as it is by multiplying them with 1 while computing the gradient descent. so in the end, we won't be considering theta1 in computation. Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM