简体   繁体   中英

Understanding Gradient Descent Algorithm

I'm learning Machine Learning . I was reading a topic called Linear Regression with one variable and I got confused while understanding Gradient Descent Algorithm .

Suppose we have given a problem with a Training Set such that pair $(x^{(i)},y^{(i)})$ represents (feature/Input Variable, Target/ Output Variable). Our goal is to create a hypothesis function for this training set, Which can do prediction.

Hypothesis Function: $$h_{\\theta}(x)=\\theta_0 + \\theta_1 x$$

Our target is to choose $(\\theta_0,\\theta_1)$ to best approximate our $h_{\\theta}(x)$ which will predict values on the training set

Cost Function: $$J(\\theta_0,\\theta_1)=\\frac{1}{2m}\\sum\\limits_{i=1}^m (h_{\\theta}(x^{(i)})-y^{(i)})^2$$

$$J(\\theta_0,\\theta_1)=\\frac{1}{2}\\times Mean Squared Error$$

We have to minimize $J(\\theta_0,\\theta_1)$ to get the values $(\\theta_0,\\theta_1)$ which we can put in our hypothesis function to minimize it. We can do that by applying Gradient Descent Algorithm on the plot $(\\theta_0,\\theta_1,J(\\theta_0,\\theta_1))$.

My question is how we can choose $(\\theta_0,\\theta_1)$ and plot the curve $(\\theta_0,\\theta_1,J(\\theta_0,\\theta_1))$. In the online lecture, I was watching. The instructor told everything but didn't mentioned from where the plot will come.

At each iteration you will have some h_\\theta , and you will calculate the value of 1/2n * sum{(h_\\theta(x)-y)^2 | for each x in train set} 1/2n * sum{(h_\\theta(x)-y)^2 | for each x in train set} .
At each iteration h_\\theta is known, and the values (x,y) for each train set sample is known, so it is easy to calculate the above.

For each iteration, you have a new value for \\theta , and you can calculate the new MSE.

The plot itself will have the iteration number on x axis, and MSE on y axis.

As a side note, while you can use gradient descent - there is no reason. This cost function is convex and it has a singular minimum that is well known: $\\theta = (X^T*X)^{-1)X^Ty$ , where y is the values of train set (1xn dimension for train set of size n), and X is 2xn matrix where each line X_i=(1,x_i) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM