简体   繁体   English

简单线性回归不收敛

[英]Simple Linear Regression not converging

In my attempt to dig deeper in the math behind machine learning models, I'm implementing a Ordinary Least Square algorithm in Python, using vectorization.在我尝试深入挖掘机器学习模型背后的数学时,我正在使用矢量化在 Python 中实现普通最小二乘算法。 My references are:我的参考资料是:

This is what I have now:这就是我现在所拥有的:

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

%matplotlib inline

X, y = datasets.load_diabetes(return_X_y=True)

# We only take the first feature (for visualization purposes).
X = X[:, np.newaxis, 2]

# Split the data into training/testing sets
X_train = X[:-20]
X_test = X[-20:]
y_train = y[:-20]
y_test = y[-20:]

# Input data
sns.scatterplot(
    x=X_train[:, 0],
    y=y_train,
    label="train",
    edgecolor=None,
    color="blue"
)
# To predict
sns.scatterplot(
    x=X_test[:, 0],
    y=y_test,
    label="test",
    edgecolor=None,
    marker="*",
    color="red",
);

class LinearRegression:
    """
    Ordinary least squares Linear Regression.

    Args:
 
    """

    def __init__(self, learning_rate: float = 0.01, tolerance: float = 1e4, standardize: bool = True):
        # TODO: standardize if required
        self._learning_rate: float = learning_rate
        self._tolerance: float = tolerance
        self._standardize: bool = standardize
        self._fitted: bool = False
        

    def fit(self, X: np.ndarray, y: np.ndarray) -> None:
        """Fit linear model."""
        self._X: np.ndarray = X
        self._y: np.ndarray = y[:, np.newaxis]
        self._m, self._n = self._X.shape  # rows, features
        self._weights: np.ndarray = np.zeros((self._n, 1))
            
        self._train()

    def predict(self, X: np.ndarray, add_bias: bool = True) -> np.ndarray:
        """Predict using the linear model."""
        assert self._fitted, "Model not fitted."
        if add_bias:
            X = np.c_[np.ones((X.shape[0], 1)), X]
        
        predictions = np.dot(X, self._weights)
        return predictions

    def _train(self) -> None:
        """
        Generate the clusters from the traning data.

        Algorithm:
            1. Initiliaze weights.
            2. Compute the cost.
            3. Calculate the gradient.
            4. Update weights.
            4. Repeat from 2 until convergence.
        """
        # Add bias term
        self._X = np.c_[np.ones((self._m, 1)), self._X]
        self._weights = np.r_[np.ones((1, 1)), self._weights]
        
        self._fitted = True
        
        converged = False
        iterations = 0
        while not converged:
            iterations += 1
            y_hat = self.predict(self._X, add_bias=False)
            residuals = self._residuals(self._y, y_hat)
            gradients = self._gradients(self._X, residuals)
            self._weights -= self._learning_rate * gradients
                                       
            gradient_magnitude = np.linalg.norm(gradients)
            print(gradient_magnitude)
            if gradient_magnitude < self._tolerance:
                converged = True
                
            print(self._weights)
            print(iterations)
    
    def _residuals(self, y: np.ndarray, y_hat: np.ndarray) -> np.ndarray:
        residuals = y - y_hat
        return residuals
    
    def _gradients(self, X: np.ndarray, residuals: np.ndarray) -> np.ndarray:
        gradients = -2 * np.dot(X.T, residuals)
        return gradients

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

clf = LinearRegression()
clf.fit(X_train, y_train)

The problem I'm facing is that my weights keep increasing until I end up getting a bunch of nans.我面临的问题是我的体重不断增加,直到我最终得到一堆 nans。 I've been trying to find out what I'm missing, but so far no luck.我一直试图找出我错过了什么,但到目前为止还没有运气。 Also tried to tweak the tolerance threshold, but I don't think that's the issue, but something wrong with my math.还尝试调整容差阈值,但我认为这不是问题,而是我的数学有问题。

Your code seems actually to work fine;您的代码实际上似乎工作正常; except for learning rate, really!除了学习率,真的! Just reduce it from 0.01 to eg 0.0001 and everything works fine (well, I would also reduce tolerance to something much much smaller, like 1e-5 , to make sure it actually converges to the right solution).只需将它从0.01减少到例如0.0001并且一切正常(好吧,我也会将容差减少到小得多的东西,例如1e-5 ,以确保它实际上收敛到正确的解决方案)。

Small image showing that it works:显示它有效的小图像:

clf = LinearRegression(learning_rate=0.0001)
clf.fit(X_train, y_train)
b, m = clf._weights[:, 0]
plt.scatter(X_train[:, 0], y_train)
plt.plot([-2, 4], [-2 * m + b, 4 * m + b])

gives

绘图结果

Linear regression is a convex optimization problem, so you can imagine it like putting a ball on a parabola and then moving it towards the bottom by a fixed amount of space multiplied by the slope of the position you're at.线性回归是一个凸优化问题,因此您可以将其想象为将球放在抛物线上,然后将其向底部移动固定量的空间乘以您所在的 position 的斜率。 If that "fixed amount" is small enough, you get closer and closer to the bottom, until you find the optimum position.如果那个“固定量”足够小,你就会越来越接近底部,直到找到最佳的 position。 But if you get the value too large, you jump from one side of the parabola to the other, and if it's large enough you land in a place which is actually higher than where you started from.但是如果你得到的值太大,你就会从抛物线的一侧跳到另一侧,如果它足够大,你就会降落在一个实际上比你开始的地方更高的地方。 Iterate this a few times and you get indeed in the exact situation you had...重复几次,您确实会遇到您所遇到的确切情况......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM