I am having some difficulties in implementing logistic regression, in terms of how should I should proceed stepwise. According to what I have done so far I am implementing it in the following way:
First taking theta
equal to the number of features and making it a n*1
vector of zeros. Now using this theta
to compute the the following htheta = sigmoid(theta' * X');
theta = theta - (alpha/m) * sum (htheta' - y)'*X
Now using the theta
computed in the first step to compute the cost function
J= 1/m *((sum(-y*log(htheta))) - (sum((1-y) * log(1 - htheta)))) + lambda/(2*m) * sum(theta).^2
In the end computing the gradient
grad = (1/m) * sum ((sigmoid(X*theta) - y')*X);
As i am taking theta
to be zero. I am getting same value of J
throughout the vector, is this the right output?
You are computing the gradient in the last step, while it has been computed before in the computation of the new theta
. Moreover, your definition of the cost function contains a regularization parameter, but this is not incorporated in the gradient computation. A working version without the regularization:
% generate dummy data for testing
y=randi(2,[10,1])-1;
X=[ones(10,1) randn([10,1])];
% initialize
alpha = 0.1;
theta = zeros(1,size(X,2));
J = NaN(100,1);
% loop a fixed number of times => can improve this by stopping when the
% cost function no longer decreases
htheta = sigmoid(X*theta');
for n=1:100
grad = X' * (htheta-y); % gradient
theta = theta - alpha*grad'; % update theta
htheta = sigmoid(X*theta');
J(n) = sum(-y'*log(htheta)) - sum((1-y)' * log(1 - htheta)); % cost function
end
If you now plot the cost function, you will see (except for randomness) that it converges after about 15 iterations.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.