Python 不收敛的批量梯度下降

Question

Here's the Jupyter Notebook I used for this practice: https://drive.google.com/file/d/18-OXyvXSit5x0ftiW9bhcqJrO_SE22_S/view?usp=sharing这是我用于此练习的 Jupyter Notebook： https ://drive.google.com/file/d/18-OXyvXSit5x0ftiW9bhcqJrO_SE22_S/view?usp=sharing

I was practicing simple Linear Regression with this data set, and here are my parameters:我正在用这个数据集练习简单的线性回归，这里是我的参数：

sat = np.array(data['SAT'])
gpa = np.array(data['GPA'])
theta_0 = 0.01
theta_1 = 0.01
alpha = 0.003
cost = 0
m = len(gpa)

I tried to optimize the cost function calculation by turning it into a matrix and perform element wise operations.我试图通过将其转换为矩阵并执行元素操作来优化成本函数计算。 This is the resulting formula I came up with:这是我想出的结果公式：

Cost function optimization:成本函数优化： 成本函数优化（图片）

Cost function成本函数

def calculateCost(matrix_x,matrix_y,m):
    global theta_0,theta_1
    cost = (1 / (2 * m)) * ((theta_0 + (theta_1 * matrix_x) - matrix_y) ** 2).sum()
    return cost

I also tried to do the same for the gradient descent.我也尝试对梯度下降做同样的事情。

Gradient Descent梯度下降

def gradDescent(alpha,matrix_x,matrix_y):
    global theta_0,theta_1,m,cost
    cost = calculateCost(sat,gpa,m)
    while cost > 1
        temp_0 = theta_0 - alpha * (1 / m) * (theta_0 + theta_1 * matrix_x - matrix_y).sum()
        temp_1 = theta_1 - alpha * (1 / m) * (matrix_x.transpose() * (theta_0 + theta_1 * matrix_x - matrix_y)).sum()
        theta_0 = temp_0
        theta_1 = temp_1

I am not entirely sure whether both implementations are correct.我不完全确定这两种实现是否正确。 The implementation returned a cost of 114.89379821428574 and somehow this is how the "descent" looked like when I graph the costs:该实现返回了114.89379821428574的成本，不知何故，当我绘制成本图时，这就是“下降”的样子：

Gradient descent graph:梯度下降图：

梯度下降图

Please correct me if I have implemented both the cost function and gradient descent correctly, and provide explanation if possible as I am still a beginner in multivariable calculus.如果我正确地实现了成本函数和梯度下降，请纠正我，并在可能的情况下提供解释，因为我仍然是多变量微积分的初学者。 Thank you.谢谢你。

Answer 1

The are many issues with that code.该代码有很多问题。

First, the two main issues that are behind the bugs:首先，错误背后的两个主要问题：

1) The line 1）线路

temp_1 = theta_1 - alpha * (1 / m) * (matrix_x.transpose() * (theta_0 + theta_1 * matrix_x - matrix_y)).sum()

specifically the matrix multiplication matrix_x.transpose() * (theta_0 + ...) .特别是矩阵乘法matrix_x.transpose() * (theta_0 + ...) 。 The * operator does element-wise multiplication, and as a result, the result is of size 20x20 , where you expect a gradient that is of size 1x1 (as you update a single real variable theta_1 . *运算符执行逐元素乘法，因此，结果的大小为20x20 ，您期望梯度大小为1x1 （当您更新单个实数变量theta_1 。

2) The while cost>1: condition in your gradient computation. 2） while cost>1:梯度计算中的条件。 You never update the cost in the loop...你永远不会更新循环中的成本......

Here is a version of your code that works:这是您的代码的一个有效版本：

import numpy as np
import matplotlib.pyplot as plt

sat=np.random.rand(40,1)
rand_a=np.random.randint(500)
rand_b=np.random.randint(400)
gpa=rand_a*sat+rand_b
theta_0 = 0.01
theta_1 = 0.01
alpha = 0.1
cost = 0
m = len(gpa)

def calculateCost(matrix_x,matrix_y,m):
    global theta_0,theta_1
    cost = (1 / 2 * m) * ((theta_0 + (theta_1 * matrix_x) - matrix_y) ** 2).sum()
    return cost

def gradDescent(alpha,matrix_x,matrix_y,num_iter=10000,eps=0.5):
    global theta_0,theta_1,m,cost
    cost = calculateCost(sat,gpa,m)
    cost_hist=[cost]
    for i in range(num_iter):
        theta_0 -= alpha * (1 / m) * (theta_0 + theta_1 * matrix_x - matrix_y).sum()
        theta_1 -= alpha * (1 / m) * (matrix_x.transpose().dot(theta_0 + theta_1 * matrix_x - matrix_y)).sum()
        cost = calculateCost(sat,gpa,m)
        cost_hist.append(cost)
        if cost<eps:
            return cost_hist

if __name__=="__main__":

    print("init_cost==",cost)
    cost_hist=gradDescent(alpha,sat,gpa)
    print("final_cost,num_iters",cost,len(cost_hist))
    print(rand_b,theta_0,rand_a,theta_1)
    plt.plot(cost_hist,linewidth=5,color="r");plt.show()

Finally, the coding style itself, while not responsible for the bugs, is definitely an issue here.最后，编码风格本身虽然不对错误负责，但在这里绝对是一个问题。 Generally, global variables are just bad practice.通常，全局变量只是不好的做法。 They just lead to bugprone, unmaintainable code.它们只会导致容易出错、无法维护的代码。 It's always better to store them in small data structures and pass them around to functions.最好将它们存储在小型数据结构中并将它们传递给函数。 In your case, you can just put the initial parameters in a list, pass them to your gradient computation function and return the optimized ones at the end.在您的情况下，您可以将初始参数放在一个列表中，将它们传递给您的梯度计算函数并在最后返回优化的参数。

Answer 2

You maid a wrong implementation of the cost function :你错误地执行了成本函数：

1 / 2 * m is interpreted as m/2 , you should write 1/2/m . 1 / 2 * m被解释为m/2 ，你应该写1/2/m 。

Python 不收敛的批量梯度下降

问题描述

2 个解决方案

解决方案1
0 已采纳 2019-12-19 11:30:15

解决方案2
0

Python 不收敛的批量梯度下降

问题描述

2 个解决方案

解决方案1 0 已采纳 2019-12-19 11:30:15

解决方案2 0

解决方案1
0 已采纳 2019-12-19 11:30:15

解决方案2
0