[英]Gradient descent for linear regression with numpy
I want to implement gradient descent with numpy for linear regression but I have some error in this code:我想用 numpy 实现梯度下降以进行线性回归,但我在这段代码中有一些错误:
import numpy as np
# Code Example
rng = np.random.RandomState(10)
X = 10*rng.rand(1000, 5) # feature matrix
y = 0.9 + np.dot(X, [2.2, 4, -4, 1, 2]) # target vector
# GD implementation for linear regression
def GD(X, y, eta=0.1, n_iter=20):
theta = np.zeros((X.shape[0], X.shape[1]))
for i in range(n_iter):
grad = 2 * np.mean((np.dot(theta.T, X) - y) * X)
theta = theta - eta * grad
return theta
# SGD implementation for linear regression
def SGD(X, y, eta=0.1, n_iter=20):
theta = np.zeros(1, X.shape[1])
for i in range(n_iter):
for j in range(X.shape[0]):
grad = 2 * np.mean((np.dot(theta.T, X[j,:]) - y[j]) * X[j,:])
theta = theta - eta * grad
return theta
# MSE loss for linear regression with numpy
def MSE(X, y, theta):
return np.mean((X.dot(theta.T) - y)**2)
# linear regression with GD and MSE with numpy
theta_gd = GD(X, y)
theta_sgd = SGD(X, y)
print('MSE with GD: ', MSE(X, y, theta_gd))
print('MSE with SGD: ', MSE(X, y, theta_sgd))
The error is错误是
grad = 2 * np.mean((np.dot(theta.T, X) - y) * X)
ValueError: operands could not be broadcast together with shapes (5,5) (1000,)
and I can't solve it.我无法解决它。
Each observation has 5 features, and X
contains 1000 observations:每个观测值有 5 个特征, X
包含 1000 个观测值:
X = rng.rand(1000, 5) * 10 # X.shape == (1000, 5)
Create y
which is perfectly linearly correlated with X
(with no distortions):创建与X
完全线性相关的y
(没有失真):
real_weights = np.array([2.2, 4, -4, 1, 2]).reshape(-1, 1)
real_bias = 0.9
y = X @ real_weights + real_bias # y.shape == (1000, 1)
GD implementation for linear regression:线性回归的 GD 实现:
Note: w
(weights) is your theta
variable.注意: w
(权重)是您的theta
变量。 I have also added the calculation of b
(bias).我还添加了b
(偏差)的计算。
def GD(X, y, eta=0.1, n_iter=20):
# Initialize weights and a bias (all zeros):
w = np.zeros((X.shape[1], 1)) # w.shape == (5, 1)
b = 0
# Gradient descent
for i in range(n_iter):
errors = X @ w + b - y # errors.shape == (1000, 1)
dw = 2 * np.mean(errors * X, axis=0).reshape(5, 1)
db = 2 * np.mean(errors)
w -= eta * dw
b -= eta * db
return w, b
Testing:测试:
w, b = GD(X, y, eta=0.003, n_iter=5000)
print(w, b)
[[ 2.20464905]
[ 4.00510139]
[-3.99569374]
[ 1.00444026]
[ 2.00407476]] 0.7805448262466914
Notes:笔记:
SGD
also contains some error..您的 function SGD
也包含一些错误。@
operator because it's just my preference over np.dot
.我正在使用@
运算符,因为它只是我对np.dot
的偏好。Minor changes in your code that resolve dimensionality issues during matrix multiplication make the code run successfully.在矩阵乘法期间解决维数问题的代码中的微小更改使代码成功运行。 In particular, note that a linear regression on a design matrix X
of dimension Nxk
has a parameter vector theta
of size k
.特别要注意,对维度为Nxk
的设计矩阵X
的线性回归具有大小为k
的参数向量theta
。
In addition, I'd suggest some changes in SGD()
that make it a proper stochastic gradient descent.此外,我建议对SGD()
进行一些更改,使其成为适当的随机梯度下降。 Namely, evaluating the gradient over random subsets of the data realized as realized by randomly partitioning the index set of the train data with np.random.shuffle()
and looping through it.即,通过使用np.random.shuffle()
随机划分训练数据的索引集并循环遍历它来评估数据的随机子集上的梯度。 The batch_size
determines the size of each subset after which the parameter estimate is updated. batch_size
确定每个子集的大小,之后更新参数估计。 The argument seed
ensures reproducibility.参数seed
确保可重复性。
# GD implementation for linear regression
def GD(X, y, eta=0.001, n_iter=100):
theta = np.zeros(X.shape[1])
for i in range(n_iter):
for j in range(X.shape[0]):
grad = (2 * np.mean(X[j,:] @ theta - y[j]) * X[j,:]) # changed line
theta -= eta * grad
return theta
# SGD implementation for linear regression
def SGD(X, y, eta=0.001, n_iter=1000, batch_size=25, seed=7678):
theta = np.zeros(X.shape[1])
indexSet = list(range(len(X)))
np.random.seed(seed)
for i in range(n_iter):
np.random.shuffle(indexSet) # random shuffle of index set
for j in range(round(len(X) / batch_size)+1):
X_sub = X[indexSet[j*batch_size:(j+1)*batch_size],:]
y_sub = y[indexSet[j*batch_size:(j+1)*batch_size]]
if(len(X_sub) > 0):
grad = (2 * np.mean(X_sub @ theta - y_sub) * X_sub) # changed line
theta -= eta * np.mean(grad, axis=0)
return theta
Running the code, I get运行代码,我得到
print('MSE with GD : ', MSE(X, y, theta_gd))
print('MSE with SGD: ', MSE(X, y, theta_sgd))
> MSE with GD : 0.07602
MSE with SGD: 0.05762
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.