简体   繁体   English

同时更新 theta0 和 theta1 以计算 Python 中的梯度下降

[英]simultaneously update theta0 and theta1 to calculate gradient descent in python

I am taking the machine learning course from coursera.我正在学习coursera的机器学习课程。 There is a topic called gradient descent to optimize the cost function.有一个主题叫做梯度下降来优化成本函数。 It says to simultaneously update theta0 and theta1 such that it will minimize the cost function and will reach to global minimum.它说同时更新 theta0 和 theta1,这样它将最小化成本函数并达到全局最小值。

The formula for gradient descent is梯度下降的公式是

在此处输入图片说明

How do i do this programmatically using python?我如何使用 python 以编程方式执行此操作? I am using numpy array and pandas to start from scratch to understand step by step its logic.我正在使用 numpy array 和 pandas 从头开始​​逐步理解其逻辑。

For now i have only calculated cost function现在我只计算了成本函数

# step 1 - collect our data
data = pd.read_csv("datasets.txt", header=None)

def compute_cost_function(x, y, theta):
    '''
        Taking in a numpy array x, y, theta and generate the cost function
    '''
    m = len(y)
    # formula for prediction = theta0 + theta1.x
    predictions = x.dot(theta)
    # formula for square error = ((theta1.x + theta0) - y)**2
    square_error = (predictions - y)**2
    # sum of square error function
    return 1/(2*m) * np.sum(square_error)

# converts into numpy represetation of the pandas dataframe. The axes labels will be excluded
numpy_data = data.values
m = data[0].size
x = np.append(np.ones((m, 1)), numpy_data[:, 0].reshape(m, 1), axis=1)
y = numpy_data[:, 1].reshape(m, 1)
theta = np.zeros((2, 1))

compute_cost_function(x, y, theta)

def gradient_descent(x, y, theta, alpha):
    '''
        simultaneously update theta0 and theta1 where 
        theta0 = theta0 - apha * 1/m * (sum of square error)
    '''
    pass

I know i have to call that compute_cost_function from gradient descent but could not apply that formula.我知道我必须从梯度下降中调用计算compute_cost_function ,但无法应用该公式。

What it means is that you use the previous values of the parameters and compute what you need on the right hand side.这意味着您使用参数的先前值并在右侧计算您需要的值。 Once you're done, update the parameters.完成后,更新参数。 To do this the most clearly, create a temporary array inside your function that stores the results on the right hand side and return the computed result when you're finished.要最清楚地做到这一点,请在函数内部创建一个临时数组,将结果存储在右侧,并在完成后返回计算结果。

def gradient_descent(x, y, theta, alpha):
    ''' simultaneously update theta0 and theta1 where
        theta0 = theta0 - apha * 1/m * (sum of square error) ''' 
    theta_return = np.zeros((2, 1))
    theta_return[0] = theta[0] - (alpha / m) * ((x.dot(theta) - y).sum())
    theta_return[1] = theta[1] - (alpha / m) * (((x.dot(theta) - y)*x[:, 1][:, None]).sum())

    return theta_return

We first declare the temporary array then compute each part of the parameters, namely the intercept and slope separately then return what we need.我们首先声明临时数组,然后分别计算每个部分的参数,即截距和斜率,然后返回我们需要的。 The nice thing about the above code is that we're doing it vectorized.上面代码的好处在于我们将其矢量化。 For the intercept term, x.dot(theta) performs matrix vector multiplication where you have your data matrix x and parameter vector theta .对于截距项, x.dot(theta)执行矩阵向量乘法,其中您有数据矩阵x和参数向量theta By subtracting this result with the output values y , we are computing the sum over all errors between the predicted values and true values, then multiplying by the learning rate then dividing by the number of samples.通过用输出值y减去这个结果,我们计算预测值和真实值之间所有误差的总和,然后乘以学习率,然后除以样本数。 We do something similar with the slope term only we additionally multiply by each input value without the bias term.我们对斜率项做了一些类似的事情,只是我们额外乘以每个输入值而没有偏置项。 We additionally need to ensure the input values are in columns as slicing along the second column of x results in a 1D NumPy array instead of a 2D with a singleton column.我们还需要确保输入值在列中,因为沿着x的第二列切片会产生一维 NumPy 数组,而不是带有单列的二维数组。 This allows the elementwise multiplication to play nicely together.这允许元素乘法很好地一起玩。

One more thing to note is that you don't need to compute the cost at all when updating the parameters.还有一点要注意的是,更新参数时根本不需要计算成本。 Mind you, inside your optimization loop it'll be nice to call it as you're updating your parameters so you can see how well your parameters are learning from your data.请注意,在您的优化循环中,在更新参数时调用它会很好,这样您就可以看到参数从数据中学习的情况。


To make this truly vectorized and thus exploiting the simultaneous update, you can formulate this as a matrix-vector multiplication on the training examples alone:为了使其真正矢量化并因此利用同步更新,您可以将其表述为单独在训练示例上的矩阵向量乘法:

def gradient_descent(x, y, theta, alpha):
    ''' simultaneously update theta0 and theta1 where
        theta0 = theta0 - apha * 1/m * (sum of square error) ''' 
    return theta - (alpha / m) * x.T.dot(x.dot(theta) - y)

What this does is that when we compute x.dot(theta) , this calculates the the predicted values, then we combine this by subtracting with the expected values.它的作用是当我们计算x.dot(theta) ,它会计算预测值,然后我们通过减去期望值来将其组合起来。 This produces the error vector.这会产生误差向量。 When we pre-multiply by the transpose of x , what ends up happening is that we take the error vector and perform the summation vectorized such that the first row of the transposed matrix x corresponds to values of 1 meaning that we are simply summing up all of the error terms which gives us the update for the bias or intercept term.当我们预乘x的转置时,最终发生的是我们取误差向量并执行向量化的求和,使得转置矩阵x的第一行对应于 1 的值,这意味着我们只是将所有的误差项,它为我们提供了偏差或截距项的更新。 Similarly the second row of the transposed matrix x additionally weights each error term by the corresponding sample value in x (without the bias term of 1) and computes the sum that way.类似地,转置矩阵x的第二行额外通过x的相应样本值(没有偏差项 1)对每个误差项进行加权,并以这种方式计算总和。 The result is a 2 x 1 vector which gives us the final update when we subtract with the previous value of our parameters and weighted by the learning rate and number of samples.结果是一个 2 x 1 的向量,当我们减去参数的先前值并由学习率和样本数加权时,它会为我们提供最终更新。


I didn't realize you were putting the code in an iterative framework.我没有意识到您将代码放入迭代框架中。 In that case you need to update the parameters at each iteration.在这种情况下,您需要在每次迭代时更新参数。

def gradient_descent(x, y, theta, alpha, iterations):

    ''' simultaneously update theta0 and theta1 where
    theta0 = theta0 - apha * 1/m * (sum of square error) ''' 

    theta_return = np.zeros((2, 1))
    for i in range(iterations):
        theta_return[0] = theta[0] - (alpha / m) * ((x.dot(theta) - y).sum())
        theta_return[1] = theta[1] - (alpha / m) * (((x.dot(theta) - y)*x[:, 1][:, None]).sum())
        theta = theta_return

    return theta

theta = gradient_descent(x, y, theta, 0.01, 1000)

At each iteration, you update the parameters then set it properly so that the next time, the current updates become the previous updates.在每次迭代中,您更新参数然后正确设置它,以便下一次,当前更新成为以前的更新。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM