简体   繁体   English

如何创建一个简单的Gradient Descent算法

[英]How to create a simple Gradient Descent algorithm

I'm studying simple machine learning algorithms, beginning with a simple gradient descent, but I've got some trouble trying to implement it in python. 我正在研究简单的机器学习算法,从简单的梯度下降开始,但是我在尝试在python中实现它时遇到了一些麻烦。

Here is the example I'm trying to reproduce, I've got data about houses with the (living area (in feet2), and number of bedrooms) with the resulting price : 这是我试图重现的例子,我有关于房屋的数据(生活区域(以英尺2为单位)和卧室数量),结果价格如下:

Living area (feet2) : 2104 居住面积(2英尺):2104

#bedrooms : 3 #bedrooms:3

Price (1000$s) : 400 价格(1000 $ s):400

I'm trying to do a simple regression using the gradient descent method, but my algorithm won't work... The form of the algorithm is not using vectors on purpose (I'm trying to understand it step by step). 我正在尝试使用梯度下降法进行简单的回归,但我的算法不起作用......算法的形式不是故意使用向量(我试图逐步理解它)。

i = 1
import sys
derror=sys.maxint
error = 0
step = 0.0001
dthresh = 0.1
import random

theta1 = random.random()
theta2 = random.random()
theta0 = random.random()
while derror>dthresh:
    diff = 400 - theta0 - 2104 * theta1 - 3 * theta2
    theta0 = theta0 + step * diff * 1
    theta1 = theta1 + step * diff * 2104
    theta2 = theta2 + step * diff * 3
    hserror = diff**2/2
    derror = abs(error - hserror)
    error = hserror
    print 'iteration : %d, error : %s' % (i, error)
    i+=1

I understand the math, I'm constructing a predicting function 我理解数学,我正在构建一个预测函数 $$ h _ {\\ theta}(x)= \\ theta_0 + \\ theta_1 x_1 + \\ theta_2 x_2 $$ with $ X_1 $ and $ X_2 $ being the variables (living area, number of bedrooms) and 是变量(生活区,卧室数)和 $ H _ {\\ THETA}(x)的$ the estimated price. 估计价格。

I'm using the cost function ( 我正在使用成本函数( $ hserror $ ) (for one point) : )(一分): $$ hserror = \\ frac {1} {2}(h _ {\\ theta}(x) -  y)^ 2 $$ This is a usual problem, but I'm more of a software engineer and I'm learning one step at a time, can you tell me what's wrong ? 这是一个常见的问题,但我更像是一名软件工程师而且我一步一步学习,你能告诉我什么是错的吗?

I got it working with this code : 我得到了这个代码:

data = {(2104, 3) : 400, (1600,3) : 330, (2400, 3) : 369, (1416, 2) : 232, (3000, 4) : 540}
for x in range(10):
    i = 1
    import sys
    derror=sys.maxint
    error = 0
    step = 0.00000001
    dthresh = 0.0000000001
    import random

    theta1 = random.random()*100
    theta2 = random.random()*100
    theta0 = random.random()*100
    while derror>dthresh:
        diff = 400 - (theta0 + 2104 * theta1 + 3 * theta2)
        theta0 = theta0 + step * diff * 1
        theta1 = theta1 + step * diff * 2104
        theta2 = theta2 + step * diff * 3
        hserror = diff**2/2
        derror = abs(error - hserror)
        error = hserror
        #print 'iteration : %d, error : %s, derror : %s' % (i, error, derror)
        i+=1
    print ' theta0 : %f, theta1 : %f, theta2 : %f' % (theta0, theta1, theta2)
    print ' done : %f' %(theta0 + 2104 * theta1 + 3*theta2)

which ends up with answers like this : 最终得到这样的答案:

 theta0 : 48.412337, theta1 : 0.094492, theta2 : 50.925579
 done : 400.000043
 theta0 : 0.574007, theta1 : 0.185363, theta2 : 3.140553
 done : 400.000042
 theta0 : 28.588457, theta1 : 0.041746, theta2 : 94.525769
 done : 400.000043
 theta0 : 42.240593, theta1 : 0.096398, theta2 : 51.645989
 done : 400.000043
 theta0 : 98.452431, theta1 : 0.136432, theta2 : 4.831866
 done : 400.000043
 theta0 : 18.022160, theta1 : 0.148059, theta2 : 23.487524
 done : 400.000043
 theta0 : 39.461977, theta1 : 0.097899, theta2 : 51.519412
 done : 400.000042
 theta0 : 40.979868, theta1 : 0.040312, theta2 : 91.401406
 done : 400.000043
 theta0 : 15.466259, theta1 : 0.111276, theta2 : 50.136221
 done : 400.000043
 theta0 : 72.380926, theta1 : 0.013814, theta2 : 99.517853
 done : 400.000043

First issue is that running this with only one piece of data gives you an underdetermined system... this means it may have an infinite number of solutions. 第一个问题是,只使用一个数据来运行它会给你一个不确定的系统...这意味着它可能有无数的解决方案。 With three variables, you'd expect to have at least 3 data points, preferably much higher. 有三个变量,您希望至少有3个数据点,最好是更高。

Secondly using gradient descent where the step size is a scaled version of the gradient is not guaranteed to converge except in a small neighbourhood of the solution. 其次使用梯度下降,其中步长是梯度的缩放版本,除了在解的小邻域之外不保证收敛。 You can fix that by switching to either a fixed size step in the direction of the negative gradient (slow) or a linesearch in the direction of the negative gradient ( faster, but slightly more complicated) 你可以通过切换到负梯度(慢)方向的固定大小步骤或负梯度方向的linesearch(更快,但稍微复杂)来解决这个问题。

So for fixed step size instead of 所以对于固定步长而不是

theta0 = theta0 - step * dEdtheta0
theta1 = theta1 - step * dEdtheta1
theta2 = theta2 - step * dEdtheta2

You do this 你做这个

n = max( [ dEdtheta1, dEdtheta1, dEdtheta2 ] )    
theta0 = theta0 - step * dEdtheta0 / n
theta1 = theta1 - step * dEdtheta1 / n
theta2 = theta2 - step * dEdtheta2 / n

It also looks like you may have a sign error in your steps. 看起来您的步骤中可能还有一个符号错误。

I'm also not sure that derror is a good stopping criteria. 我也不确定恐怖是一个很好的停止标准。 (But stopping criteria are notoriously hard to get "right") (但是停止标准是非常难以“正确”)

My final point is that gradient descent is horribly slow for parameter fitting. 我的最后一点是,参数拟合的梯度下降速度非常慢。 You probably want to use conjugate-gradient or Levenberg-Marquadt methods instead. 您可能希望使用共轭梯度或Levenberg-Marquadt方法。 I suspect that both of these methods already exist for python in the numpy or scipy packages (which aren't part of python by default but are pretty easy to install) 我怀疑这两种方法在numpy或scipy包中已经存在python(默认情况下它们不是python的一部分,但很容易安装)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM