简体   繁体   English

使用 numpy 进行线性回归的梯度下降

[英]Gradient descent for linear regression with numpy

I want to implement gradient descent with numpy for linear regression but I have some error in this code:我想用 numpy 实现梯度下降以进行线性回归,但我在这段代码中有一些错误:

import numpy as np

# Code Example
rng = np.random.RandomState(10)
X = 10*rng.rand(1000, 5) # feature matrix
y = 0.9 + np.dot(X, [2.2, 4, -4, 1, 2]) # target vector

# GD implementation for linear regression
def GD(X, y, eta=0.1, n_iter=20):
    theta = np.zeros((X.shape[0], X.shape[1]))
    for i in range(n_iter):
        grad = 2 * np.mean((np.dot(theta.T, X) - y) * X)
        theta = theta - eta * grad
    return theta

# SGD implementation for linear regression
def SGD(X, y, eta=0.1, n_iter=20):
    theta = np.zeros(1, X.shape[1])
    for i in range(n_iter):
        for j in range(X.shape[0]):
            grad = 2 * np.mean((np.dot(theta.T, X[j,:]) - y[j]) * X[j,:])
            theta = theta - eta * grad
    return theta

# MSE loss for linear regression with numpy
def MSE(X, y, theta):
    return np.mean((X.dot(theta.T) - y)**2)

# linear regression with GD and MSE with numpy
theta_gd = GD(X, y)
theta_sgd = SGD(X, y)

print('MSE with GD: ', MSE(X, y, theta_gd))
print('MSE with SGD: ', MSE(X, y, theta_sgd))

The error is错误是

grad = 2 * np.mean((np.dot(theta.T, X) - y) * X)
ValueError: operands could not be broadcast together with shapes (5,5) (1000,)

and I can't solve it.我无法解决它。

Each observation has 5 features, and X contains 1000 observations:每个观测值有 5 个特征, X包含 1000 个观测值:

X = rng.rand(1000, 5) * 10  # X.shape == (1000, 5)

Create y which is perfectly linearly correlated with X (with no distortions):创建与X完全线性相关的y (没有失真):

real_weights = np.array([2.2, 4, -4, 1, 2]).reshape(-1, 1)
real_bias = 0.9
y = X @ real_weights + real_bias  # y.shape == (1000, 1)

GD implementation for linear regression:线性回归的 GD 实现:

Note: w (weights) is your theta variable.注意: w (权重)是您的theta变量。 I have also added the calculation of b (bias).我还添加了b (偏差)的计算。

def GD(X, y, eta=0.1, n_iter=20):
    # Initialize weights and a bias (all zeros):
    w = np.zeros((X.shape[1], 1))  # w.shape == (5, 1)
    b = 0
    # Gradient descent
    for i in range(n_iter):
        errors = X @ w + b - y  # errors.shape == (1000, 1)
        dw = 2 * np.mean(errors * X, axis=0).reshape(5, 1)
        db = 2 * np.mean(errors)
        w -= eta * dw
        b -= eta * db
    return w, b

Testing:测试:

w, b = GD(X, y, eta=0.003, n_iter=5000)
print(w, b)
[[ 2.20464905]
 [ 4.00510139]
 [-3.99569374]
 [ 1.00444026]
 [ 2.00407476]] 0.7805448262466914

Notes:笔记:

  • Your function SGD also contains some error..您的 function SGD也包含一些错误。
  • I'm using the @ operator because it's just my preference over np.dot .我正在使用@运算符,因为它只是我对np.dot的偏好。

Minor changes in your code that resolve dimensionality issues during matrix multiplication make the code run successfully.在矩阵乘法期间解决维数问题的代码中的微小更改使代码成功运行。 In particular, note that a linear regression on a design matrix X of dimension Nxk has a parameter vector theta of size k .特别要注意,对维度为Nxk的设计矩阵X的线性回归具有大小为k的参数向量theta

In addition, I'd suggest some changes in SGD() that make it a proper stochastic gradient descent.此外,我建议对SGD()进行一些更改,使其成为适当的随机梯度下降。 Namely, evaluating the gradient over random subsets of the data realized as realized by randomly partitioning the index set of the train data with np.random.shuffle() and looping through it.即,通过使用np.random.shuffle()随机划分训练数据的索引集并循环遍历它来评估数据的随机子集上的梯度。 The batch_size determines the size of each subset after which the parameter estimate is updated. batch_size确定每个子集的大小,之后更新参数估计。 The argument seed ensures reproducibility.参数seed确保可重复性。

# GD implementation for linear regression
def GD(X, y, eta=0.001, n_iter=100):
    theta = np.zeros(X.shape[1])
    for i in range(n_iter):
        for j in range(X.shape[0]):
            grad = (2 * np.mean(X[j,:] @ theta - y[j]) * X[j,:])  # changed line
            theta -= eta * grad
    return theta

# SGD implementation for linear regression
def SGD(X, y, eta=0.001, n_iter=1000, batch_size=25, seed=7678):
    theta = np.zeros(X.shape[1])
    indexSet = list(range(len(X)))
    np.random.seed(seed)
    for i in range(n_iter):
        np.random.shuffle(indexSet) # random shuffle of index set
        for j in range(round(len(X) / batch_size)+1):
            X_sub = X[indexSet[j*batch_size:(j+1)*batch_size],:]
            y_sub = y[indexSet[j*batch_size:(j+1)*batch_size]]
            if(len(X_sub) > 0):
                grad = (2 * np.mean(X_sub @ theta - y_sub) * X_sub)  # changed line
                theta -= eta * np.mean(grad, axis=0)
    return theta

Running the code, I get运行代码,我得到

print('MSE with GD : ',  MSE(X, y, theta_gd))
print('MSE with SGD: ', MSE(X, y, theta_sgd))
> MSE with GD :  0.07602
  MSE with SGD:  0.05762

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM