简体   繁体   中英

What is wrong with my Gradient Descent algorithm

Hi I'm trying to implement Gradient Descent algorithm for a function:

在此输入图像描述

My starting point for the algorithm is w = (u,v) = (2,2). The learning rate is eta = 0.01 and bound = 10^-14. Here is my MATLAB code:

function [resultTable, boundIter] = gradientDescent(w, iters, bound, eta)
% FUNCTION [resultTable, boundIter] = gradientDescent(w, its, bound, eta)
% 
% DESCRIPTION: 
% - This function will do gradient descent error minimization for the
% function E(u,v) = (u*exp(v) - 2*v*exp(-u))^2.
%
% INPUTS: 
% 'w' a 1-by-2 vector indicating initial weights w = [u,v]
% 'its' a positive integer indicating the number of gradient descent
% iterations
% 'bound' a real number indicating an error lower bound
% 'eta' a positive real number indicating the learning rate of GD algorithm
%
% OUTPUTS: 
% 'resultTable' a iters+1-by-6 table indicating the error, partial
% derivatives and weights for each GD iteration
% 'boundIter' a positive integer specifying the GD iteration when the error
% function got below the given error bound 'bound'
% 


% The error function 
E = @(u,v) (u*exp(v) - 2*v*exp(-u))^2;

% Partial derivative of E with respect to u 
pEpu = @(u,v) 2*(u*exp(v) - 2*v*exp(-u))*(exp(v) + 2*v*exp(-u));
% Partial derivative of E with respect to v 
pEpv = @(u,v) 2*(u*exp(v) - 2*v*exp(-u))*(u*exp(v) - 2*exp(-u));

% Initialize boundIter
boundIter = 0;
% Create a table for holding the results
resultTable = zeros(iters+1, 6);
% Iteration number
resultTable(1, 1) = 0;
% Error at iteration i
resultTable(1, 2) = E(w(1), w(2));
% The value of pEpu at initial w = (u,v)
resultTable(1, 3) = pEpu(w(1), w(2));
% The value of pEpv at initial w = (u,v)
resultTable(1, 4) = pEpv(w(1), w(2));
% Initial u
resultTable(1, 5) = w(1);
% Initial v
resultTable(1, 6) = w(2);

% Loop all the iterations
for i = 2:iters+1

    % Save the iteration number
    resultTable(i, 1) = i-1; 
    % Update the weights
    temp1 = w(1) - eta*(pEpu(w(1), w(2)));
    temp2 = w(2) - eta*(pEpv(w(1), w(2)));
    w(1) = temp1;
    w(2) = temp2;
    % Evaluate the error function at new weights
    resultTable(i, 2) = E(w(1), w(2));
    % Evaluate pEpu at the new point 
    resultTable(i, 3) = pEpu(w(1), w(2));
    % Evaluate pEpv at the new point
    resultTable(i, 4) = pEpv(w(1), w(2));
    % Save the new weights
    resultTable(i, 5) = w(1);
    resultTable(i, 6) = w(2);
    % If the error function is below a specified bound save this iteration
    % index
    if E(w(1), w(2)) < bound
        boundIter = i-1;
    end

end

This is an exercise in my machine learning course, but for some reason my results are all wrong. There must be something wrong in the code. I have tried debugging and debugging it and haven't found anything wrong...can someone identify what is my problem here?...In other words can you check that the code is valid gradient descent algorithm for the given function?

Please let me know if my question is too unclear or if you need more info :)

Thank you for your effort and help! =)

Here is my results for five iterations and what other people got:

PARAMETERS: w = [2,2], eta = 0.01, bound = 10^-14, iters = 5

在此输入图像描述

正如下面讨论的问题:我会说其他错误...你的最小化导致E(u,v)值更小,检查:

E(1.4,1.6) = 37.8 >> 3.6 = E(0.63, -1.67)

Not a complete answer but lets go for it:

I added a plotting part in your code, so you can see whats going on.

u1=resultTable(:,5);
v1=resultTable(:,6);
E1=E(u1,v1);
E1(E1<bound)=NaN;
[x,y]=meshgrid(-1:0.1:5,-5:0.1:2);Z=E(x,y);
surf(x,y,Z)

hold on
plot3(u1,v1,E1,'r')
plot3(u1,v1,E1,'r*')

在此输入图像描述 The result shows that your algorithm is doing the right thing for that function. So, as other said, or all the others are wrong, or you are not using the right equation from the beggining.

(I apologize for not just commenting, but I'm new to SO and cannot comment.)

It appears that your algorithm is doing the right thing. What you want to be sure is that at each step the energy is shrinking (which it is). There are several reasons why your data points may not agree with the others in the class: they could be wrong (you or others in the class), they perhaps started at a different point, they perhaps used a different step size (what you are calling eta I believe).

Ideally, you don't want to hard-code the number of iterations. You want to continue until you reach a local minimum (which hopefully is the global minimum). To check this, you want both partial derivatives to be zero (or very close). In addition, to make sure you're at a local min (not a local max, or saddle point) you should check the sign of E_uu*E_vv - E_uv^2 and the sign of E_uu look at: http://en.wikipedia.org/wiki/Second_partial_derivative_test for details (the second derivative test, at the top). If you find yourself at a local max or saddle point, your gradient will tell you not to move (since the partial derivatives are 0). Since you know this isn't optimal, you have to just perturb your solution (sometimes called simulated annealing).

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM