简体   繁体   English

Gradient Descent Matlab实现

[英]Gradient Descent Matlab implementation

I have gone through many codes in stack overflow and made my own on same line. 我在堆栈溢出中经历了很多代码并且在同一行上创建了自己的代码。 there is some problem with this code I am unable to understand. 这段代码有些问题我无法理解。 I am storing the value theta1 and theta 2 and also the cost function for analysis purpose. 我存储值theta1和theta 2以及成本函数用于分析目的。 The data for x and Y can be downloaded from this Openclassroom page. 可以从此Openclassroom页面下载x和Y的数据。 It has x and Y data in form of .dat files that you can open in notepad. 它具有.dat文件形式的x和Y数据,您可以在记事本中打开它们。

    %Single Variate Gradient Descent Algorithm%%
    clc
clear all
close all;
% Step 1 Load x series/ Input data and Output data* y series

x=load('D:\Office Docs_Jay\software\ex2x.dat');
y=load('D:\Office Docs_Jay\software\ex2y.dat');
%Plot the input vectors
plot(x,y,'o');
ylabel('Height in meters');
xlabel('Age in years');

% Step 2 Add an extra column of ones in input vector
[m n]=size(x);
X=[ones(m,1) x];%Concatenate the ones column with x;
% Step 3 Create Theta vector
theta=zeros(n+1,1);%theta 0,1
% Create temporary values for storing summation

temp1=0;
temp2=0;
% Define Learning Rate alpha and Max Iterations

alpha=0.07;
max_iterations=1;
      % Step 4 Iterate over loop
      for i=1:1:max_iterations

     %Calculate Hypothesis for all training example
     for k=1:1:m
        h(k)=theta(1,1)+theta(2,1)*X(k,2); %#ok<AGROW>
        temp1=temp1+(h(k)-y(k));
        temp2=temp2+(h(k)-y(k))*X(k,2);
     end
     % Simultaneous Update
      tmp1=theta(1,1)-(alpha*1/(2*m)*temp1);
      tmp2=theta(2,1)-(alpha*(1/(2*m))*temp2);
      theta(1,1)=tmp1;
      theta(2,1)=tmp2;
      theta1_history(i)=theta(2,1); %#ok<AGROW>
      theta0_history(i)=theta(1,1); %#ok<AGROW>
      % Step 5 Calculate cost function
      tmp3=0;
      tmp4=0;
      for p=1:m
        tmp3=tmp3+theta(1,1)+theta(2,1)*X(p,1);
        tmp4=tmp4+theta(1,1)+theta(2,1)*X(p,2);
      end
      J1_theta0(i)=tmp3*(1/(2*m)); %#ok<AGROW>
      J2_theta1(i)=tmp4*(1/(2*m)); %#ok<AGROW>


      end
      theta
      hold on;
      plot(X(:,2),theta(1,1)+theta(2,1)*X);

I am getting the value of 我正在获得价值

theta as 0.0373 and 0.1900 it should be 0.0745 and 0.3800 θ为0.0373和0.1900,应为0.0745和0.3800

this value is approximately double that I am expecting. 这个值大约是我期待的两倍。

I have been trying to implement the iterative step with matrices and vectors (ie not update each parameter of theta). 我一直在尝试用矩阵和向量来实现迭代步骤(即不更新theta的每个参数)。 Here is what I came up with (only the gradient step is here): 这是我想出的(这里只有渐变步骤):

h = X * theta;  # hypothesis
err = h - y;    # error
gradient = alpha * (1 / m) * (X' * err); # update the gradient
theta = theta - gradient;

The hard part to grasp is that the "sum" in the gradient step of the previous examples is actually performed by the matrix multiplication X'*err . 难以掌握的是前面例子的梯度步骤中的“和”实际上是由矩阵乘法X'*err You can also write it as (err'*X)' 你也可以把它写成(err'*X)'

I managed to create an algorithm that uses more of the vectorized properties that Matlab support. 我设法创建了一个使用Matlab支持的更多矢量化属性的算法。 My algorithm is a little different from yours but does the gradient descent process as you ask. 我的算法与你的算法略有不同,但是你提出的是梯度下降过程。 After the execution and validation (using polyfit function) that i made, i think that the values in openclassroom (exercise 2) that are expected in variables theta(0) = 0.0745 and theta(1) = 0.3800 are wrong after 1500 iterations with step 0.07 (i do not take response of that). 在我执行的执行和验证(使用polyfit函数)之后,我认为在1500次迭代步骤之后,变量theta(0)= 0.0745和theta(1)= 0.3800中预期的openclassroom(练习2)中的值是错误的0.07(我不回应)。 This is the reason that i plotted my results with the data in one plot and your required results with the data in another plot and i saw a big difference in data fitting procedure. 这就是为什么我用一个图中的数据绘制我的结果,而另一个图中的数据绘制了所需的结果,我发现数据拟合程序有很大差异。

First of all have a look at the code : 首先看一下代码:

% Machine Learning : Linear Regression

clear all; close all; clc;

%% ======================= Plotting Training Data =======================
fprintf('Plotting Data ...\n')

x = load('ex2x.dat');
y = load('ex2y.dat');

% Plot Data
plot(x,y,'rx');
xlabel('X -> Input') % x-axis label
ylabel('Y -> Output') % y-axis label

%% =================== Initialize Linear regression parameters ===================
 m = length(y); % number of training examples

% initialize fitting parameters - all zeros
theta=zeros(2,1);%theta 0,1

% Some gradient descent settings
iterations = 1500;
Learning_step_a = 0.07; % step parameter

%% =================== Gradient descent ===================

fprintf('Running Gradient Descent ...\n')

%Compute Gradient descent

% Initialize Objective Function History
J_history = zeros(iterations, 1);

m = length(y); % number of training examples

% run gradient descent    
for iter = 1:iterations

   % In every iteration calculate hypothesis
   hypothesis=theta(1).*x+theta(2);

   % Update theta variables
   temp0=theta(1) - Learning_step_a * (1/m)* sum((hypothesis-y).* x);
   temp1=theta(2) - Learning_step_a * (1/m) *sum(hypothesis-y);

   theta(1)=temp0;
   theta(2)=temp1;

   % Save objective function 
   J_history(iter)=(1/2*m)*sum(( hypothesis-y ).^2);

end

% print theta to screen
fprintf('Theta found by gradient descent: %f %f\n',theta(1),  theta(2));
fprintf('Minimum of objective function is %f \n',J_history(iterations));

% Plot the linear fit
hold on; % keep previous plot visible 
plot(x, theta(1)*x+theta(2), '-')

% Validate with polyfit fnc
poly_theta = polyfit(x,y,1);
plot(x, poly_theta(1)*x+poly_theta(2), 'y--');
legend('Training data', 'Linear regression','Linear regression with polyfit')
hold off 

figure
% Plot Data
plot(x,y,'rx');
xlabel('X -> Input') % x-axis label
ylabel('Y -> Output') % y-axis label

hold on; % keep previous plot visible
% Validate with polyfit fnc
poly_theta = polyfit(x,y,1);
plot(x, poly_theta(1)*x+poly_theta(2), 'y--');

% for theta values that you are saying
theta(1)=0.0745;  theta(2)=0.3800;
plot(x, theta(1)*x+theta(2), 'g--')
legend('Training data', 'Linear regression with polyfit','Your thetas')
hold off 

Ok the results are as follows : 好的结果如下:

With theta(0) and theta(1) that produced from my algorithm as a result the line fits the data. 使用由我的算法产生的theta(0)和theta(1),该行适合数据。

梯度下降 -  theta0 = 0.063883,theta1 = 0.750150

With theta(0) and theta(1) as fixed values as a result the line do not fit the data. 以theta(0)和theta(1)作为固定值,结果该行不适合数据。

梯度下降 -  theta0 = 0.0745,theta1 = 0.3800

Here are some comments: 以下是一些评论:

  1. max_iterations is set to 1 . max_iterations设置为1 Gradient descent is typically run until either the decrease in the objective function is below some threshold or the magnitude of the gradient is below some threshold, which would likely be more than one iteration. 通常运行梯度下降,直到目标函数的减小低于某个阈值或者梯度的幅度低于某个阈值,这可能是多于一次迭代。

  2. The factor of 1/(2*m) is not be technically correct. 系数1 /(2 * m)在技术上并不正确。 This should not cause the algorithm to fail, but will effectively decrease the learning rate. 这不应该导致算法失败,但会有效地降低学习速度。

  3. You are not computing the correct objective. 你没有计算正确的目标。 The correct linear regression objective should either be one-half times the average of the squared residuals or one-half times the sum of the squared residuals. 正确的线性回归目标应该是平方残差平均值的一半,或者是残差平方和的一半。

  4. Rather than using for-loops you should take advantage of matlab's vectorized computations. 您应该利用matlab的矢量化计算,而不是使用for循环。 For instance, res=X*theta-y; obj=.5/m*res'res; 例如, res=X*theta-y; obj=.5/m*res'res; res=X*theta-y; obj=.5/m*res'res; should compute the residuals ( res ) and the linear regression objective ( obj ). 应计算残差( res )和线性回归目标( obj )。

You need put temp1=0 temp2=0 as the first comment in iteration loop; 你需要把temp1 = 0 temp2 = 0作为迭代循环中的第一个注释; Cause if you don't, your current temp will influence next iteration, tht's wrong 如果你不这样做,你当前的临时会影响下一次迭代,这是错误的

From the values of Ɵ (theta) of your expectation and the program's outcome, one thing can be noticed that the expected value is twice that of the outcome. 根据您期望的Ɵ (theta)值和程序结果,可以注意到预期值是结果的两倍。

The possible mistake you made is you used 1/(2*m) in place of 1/m in the code of derivative calculation. 您可能犯的错误是在衍生计算代码中使用1/(2*m)代替1/m In the derivative the 2 of denominator vanishes as the original term was (h Ɵ (x) - y) 2 which on differentiation generates 2*(h Ɵ (x) - y) . 在导数中,分母2消失为原始项是 (x) - y) 2 ,在分化时产生2 *( (x) - y) The 2s cancel out. 2s取消了。

Modify these code lines: 修改这些代码行:

J1_theta0(i)=tmp3*(1/(2*m)); %#ok<AGROW>
J2_theta1(i)=tmp4*(1/(2*m)); %#ok<AGROW>

to

J1_theta0(i)=tmp3*(1/m); %#ok<AGROW>
J2_theta1(i)=tmp4*(1/m); %#ok<AGROW>

Hope it helps. 希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM