简体繁体中英

Understanding Gradient Descent Algorithm

原文 2015-04-05 17:35:15 9 1 algorithm/ machine-learning

I'm learning Machine Learning . I was reading a topic called Linear Regression with one variable and I got confused while understanding Gradient Descent Algorithm .

Suppose we have given a problem with a Training Set such that pair $(x^{(i)},y^{(i)})$ represents (feature/Input Variable, Target/ Output Variable). Our goal is to create a hypothesis function for this training set, Which can do prediction.

Hypothesis Function: $$h_{\\theta}(x)=\\theta_0 + \\theta_1 x$$

Our target is to choose $(\\theta_0,\\theta_1)$ to best approximate our $h_{\\theta}(x)$ which will predict values on the training set

Cost Function: $$J(\\theta_0,\\theta_1)=\\frac{1}{2m}\\sum\\limits_{i=1}^m (h_{\\theta}(x^{(i)})-y^{(i)})^2$$

$$J(\\theta_0,\\theta_1)=\\frac{1}{2}\\times Mean Squared Error$$

We have to minimize $J(\\theta_0,\\theta_1)$ to get the values $(\\theta_0,\\theta_1)$ which we can put in our hypothesis function to minimize it. We can do that by applying Gradient Descent Algorithm on the plot $(\\theta_0,\\theta_1,J(\\theta_0,\\theta_1))$.

My question is how we can choose $(\\theta_0,\\theta_1)$ and plot the curve $(\\theta_0,\\theta_1,J(\\theta_0,\\theta_1))$. In the online lecture, I was watching. The instructor told everything but didn't mentioned from where the plot will come.

1 answers

At each iteration you will have some h_\\theta , and you will calculate the value of 1/2n * sum{(h_\\theta(x)-y)^2 | for each x in train set} 1/2n * sum{(h_\\theta(x)-y)^2 | for each x in train set} .
At each iteration h_\\theta is known, and the values (x,y) for each train set sample is known, so it is easy to calculate the above.

For each iteration, you have a new value for \\theta , and you can calculate the new MSE.

The plot itself will have the iteration number on x axis, and MSE on y axis.

As a side note, while you can use gradient descent - there is no reason. This cost function is convex and it has a singular minimum that is well known: $\\theta = (X^T*X)^{-1)X^Ty$ , where y is the values of train set (1xn dimension for train set of size n), and X is 2xn matrix where each line X_i=(1,x_i) .

Gradient Descent algorithm not converging in Haskell

Gradient Descent algorithm not converging for linear regression

What is wrong with my Gradient Descent algorithm

MATLAB code not working (gradient descent algorithm)

Gradient descent algorithm giving incorrect answer in matlab

Gradient Descent Algorithm And Different Learning Rates

Trying to Implement Gradient Descent Algorithm with Fixed Step Size

Gradient Descent algorithm taking long time to complete - Efficiency - Python

What determines whether my Python gradient descent algorithm converges?

Gradient descent algorithm error non-comformable arguments

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Gradient Descent algorithm not converging in Haskell Gradient Descent algorithm not converging for linear regression What is wrong with my Gradient Descent algorithm MATLAB code not working (gradient descent algorithm) Gradient descent algorithm giving incorrect answer in matlab Gradient Descent Algorithm And Different Learning Rates Trying to Implement Gradient Descent Algorithm with Fixed Step Size Gradient Descent algorithm taking long time to complete - Efficiency - Python What determines whether my Python gradient descent algorithm converges? Gradient descent algorithm error non-comformable arguments

Related Tags

Understanding Gradient Descent Algorithm

Question

1 answers

solution1 2 ACCPTED 2015-04-05 18:03:52

solution1
2 ACCPTED 2015-04-05 18:03:52