简体   繁体   English

基于 Numpy 的梯度下降未完全收敛

[英]Numpy based gradient descent not fully converging

I believe I have implemented GD correctly (partially based on Aurelien Geron's book), but it is not returning the same result as sklearn's Linear Regression.我相信我已经正确实现了 GD(部分基于 Aurelien Geron 的书),但它没有返回与 sklearn 的线性回归相同的结果。 Here is the full notebook: https://colab.research.google.com/drive/17lvCb_F_vMskT1PxbrKCSR57B5lMWT7A?usp=sharing这是完整的笔记本: https://colab.research.google.com/drive/17lvCb_F_vMskT1PxbrKCSR57B5lMWT7A?usp=sharing

I'm not doing anything fancy, here is the code to load training data:我没有做任何花哨的事情,这是加载训练数据的代码:

import numpy as np
import pandas as pd
import sklearn.datasets

#load data
data_arr = sklearn.datasets.load_diabetes(as_frame=True).data.values

X_raw = data_arr[:,1:] 
y_raw = data_arr[:, 1:2]

#add bias
X = np.hstack((np.ones(y_raw.shape),X_raw))
y = y_raw

#do gradient descent
learning_rate = 0.001
iterations = 1_000_000

observations = X.shape[0]
features = X.shape[1]

w = np.ones((features,1))

for i in range(iterations):
    w -= (learning_rate) * (2/observations) * X.T.dot(X.dot(w) - y)

Here are the weights this produced:这是产生的权重:

array([[ 2.72774600e-17],
       [ 1.01847403e+00],
       [ 3.87858604e-02],
       [ 3.06547577e-04],
       [-3.67525543e-01],
       [ 9.09006216e-02],
       [ 4.21512716e-01],
       [ 4.25673672e-01],
       [ 4.77147289e-02],
       [-8.14471370e-03]])

And the MSE: 5.24937033143115e-05和 MSE: 5.24937033143115e-05

Here is what sklearn gives me:这是sklearn给我的:

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

%time reg = LinearRegression().fit(X, y)
reg.coef_

sklearn weights: sklearn 权重:

array([[ 0.00000000e+00,  1.00000000e+00, -9.99200722e-16,
        -1.69309011e-15, -1.11022302e-16,  1.38777878e-15,
        -3.88578059e-16,  6.80011603e-16, -8.32667268e-17,
        -5.55111512e-16]])

sklearn MSE: 1.697650600978984e-32 sklearn MSE: 1.697650600978984e-32

I've tried to increase/decrease the number of epochs and size of learning rate.我试图增加/减少时期的数量和学习率的大小。 Scikit-learn returns the results in a few milliseconds. Scikit-learn 在几毫秒内返回结果。 My GD implementation can run for minutes and still not get anywhere close to sklearn's results.我的 GD 实现可以运行几分钟,但仍然无法接近 sklearn 的结果。

Am I doing something obviously wrong here?我在这里做明显错误的事情吗?

(The notebook contains cleaner version of this code.) (笔记本包含此代码的更简洁版本。)

There is a small bug in your code as the first column of X_raw is the same as y_raw , ie the target is being used as a feature.您的代码中有一个小错误,因为y_raw的第一列与X_raw相同,即目标被用作特征。 This has been corrected in the code below.这已在下面的代码中得到纠正。

Another issue is that if you include a column of ones in the feature matrix X , then when fitting the linear regression with sklearn you should make sure to set fit_intercept=False , otherwise you will have two columns of ones in the feature matrix.另一个问题是,如果您在特征矩阵X中包含一列,那么在使用 sklearn 拟合线性回归时,您应该确保设置fit_intercept=False ,否则特征矩阵中将有两列一。

It is also not clear why you are dividing by the number of observations in the gradient update as this scales down the learning rate significantly.也不清楚为什么要除以梯度更新中的观察数,因为这会显着降低学习率。

import numpy as np
import pandas as pd
import sklearn.datasets
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# load data
data_arr = sklearn.datasets.load_diabetes(as_frame=True).data.values

# extract features and target
X_raw = data_arr[:, 1:]
y_raw = data_arr[:, :1]

# add bias
X = np.hstack((np.ones(y_raw.shape), X_raw))
y = y_raw

# do gradient descent
learning_rate = 0.001
iterations = 1000000

observations = X.shape[0]
features = X.shape[1]

w = np.ones((features, 1))

for i in range(iterations):
    w -= 2 * learning_rate * X.T.dot(X.dot(w) - y)

# exclude the intercept as X already contains a column of ones
reg = LinearRegression(fit_intercept=False).fit(X, y)

# compare the estimated coefficients
res = pd.DataFrame({
    'manual': [format(x, '.6f') for x in w.flatten()],
    'sklearn': [format(x, '.6f') for x in reg.coef_.flatten()]
})

res
#       manual    sklearn
# 0  -0.000000  -0.000000
# 1   0.101424   0.101424
# 2  -0.006468  -0.006468
# 3   0.208211   0.208211
# 4  -0.128653  -0.128653
# 5   0.236556   0.236556
# 6   0.132544   0.132544
# 7  -0.039359  -0.039359
# 8   0.177129   0.177129
# 9   0.145396   0.145396

# compare the RMSE
print(format(mean_squared_error(y, X.dot(w), squared=False), '.6f'))
# 0.043111

print(format(mean_squared_error(y, reg.predict(X), squared=False), '.6f'))
# 0.043111

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM