简体   繁体   English

线性回归实现总是比sklearn更糟糕

[英]Linear regression implementation always performs worse than sklearn

I implemented linear regression with gradient descent in python. 我在python中实现了梯度下降的线性回归。 To see how well it is doing I compared it with scikit-learn's LinearRegression() class. 为了了解它的表现如何,我将它与scikit-learn的LinearRegression()类进行了比较。 For some reason, sklearn always outperforms my program by a MSE of 3 on average (I am using the Boston Housing dataset for testing). 出于某种原因,sklearn总是比MSE平均表现优于我的程序(我使用Boston Housing数据集进行测试)。 I understand that I am currently not doing gradient checking to check for convergence, but I am allowing for many iterations and have set the learning rate low enough such that it SHOULD converge. 我知道我目前没有进行梯度检查以检查收敛,但我允许进行多次迭代并将学习率设置得足够低,以便它应该收敛。 Is there any clear bug in my learning algorithm implementation? 我的学习算法实现中有没有明显的错误? Here is my code: 这是我的代码:

import numpy as np
from sklearn.linear_model import LinearRegression

def getWeights(x):
    lenWeights = len(x[1,:]);
    weights = np.random.rand(lenWeights)
    bias = np.random.random();
    return weights,bias

def train(x,y,weights,bias,maxIter):
    converged = False;
    iterations = 1;
    m = len(x);
    alpha = 0.001;
    while not converged:
            for i in range(len(x)):
                # Dot product of weights and training sample
                hypothesis = np.dot(x[i,:], weights) + bias;
                # Calculate gradient
                error = hypothesis - y[i];
                grad = (alpha * 1/m) * ( error * x[i,:] );
                # Update weights and bias
                weights = weights - grad;
                bias = bias - alpha * error;
                iterations = iterations + 1;

                if iterations > maxIter:
                    converged = True;
                    break

    return weights, bias

def predict(x, weights, bias):
    return np.dot(x,weights) + bias

if __name__ == '__main__':

    data = np.loadtxt('housing.txt');
    x = data[:,:-1];
    y = data[:,-1];
    for i in range(len(x[1,:])):
        x[:,i] = ( (x[:,i] - np.min(x[:,i])) / (np.max(x[:,i]) - np.min(x[:,i])) );

    initialWeights,initialBias = getWeights(x);
    weights,bias = train(x,y,initialWeights,initialBias,55000);
    pred = predict(x, weights,bias);
    MSE = np.mean(abs(pred - y));

    print "This Program MSE: " + str(MSE)

    sklearnModel = LinearRegression();
    sklearnModel = sklearnModel.fit(x,y);
    sklearnModel = sklearnModel.predict(x);

    skMSE = np.mean(abs(sklearnModel - y));

    print "Sklearn MSE: " + str(skMSE)

First, make sure that you are computing the correct objective function value. 首先,确保您正在计算正确的目标函数值。 The linear regression objective should be .5*np.mean((pred-y)**2) , rather than np.mean(abs(pred - y)) . 线性回归目标应该是.5*np.mean((pred-y)**2) ,而不是np.mean(abs(pred - y))

You are actually running a stochastic gradient descent (SGD) algorithm (running a gradient iteration on individual examples), which should be distinguished from "gradient descent". 您实际上正在运行随机梯度下降(SGD)算法(对各个示例运行梯度迭代),这应该与“梯度下降”区分开来。

SGD is a good learning method, but a bad optimization method - it can take many iterations to converge to a minimum of the empirical error ( http://leon.bottou.org/publications/pdf/nips-2007.pdf ). SGD是一种很好的学习方法,但是一种糟糕的优化方法 - 它可能需要多次迭代才能收敛到最小的经验误差( http://leon.bottou.org/publications/pdf/nips-2007.pdf )。

For SGD to converge, the learning rate must be restricted. 要使SGD收敛,必须限制学习率。 Typically, the learning rate is set to the base learning rate divided by the number of iterations, something like alpha/(iterations+1) , using the variables in your code. 通常,学习速率设置为基本学习速率除以迭代次数,例如alpha/(iterations+1) ,使用代码中的变量。

You also include a multiple of 1/m in your gradient, which is typically not used in SGD updates. 您还在渐变中包含1/m的倍数,这通常不用于SGD更新。

To test your SGD implementation, rather than evaluating the error on the dataset that you trained with, split the dataset into a training set and a test set, and evaluate the error on this test set after training with both methods. 要测试您的SGD实现,而不是评估您使用的数据集上的错误,请将数据集拆分为训练集和测试集,并在使用这两种方法进行训练后评估此测试集上的错误。 The training/test set split will allow you to estimate the performance of your algorithm as a learning algorithm (estimate the expected error) rather than as an optimization algorithm (minimize the empirical error). 训练/测试集拆分将允许您将算法的性能估计为学习算法(估计预期误差)而不是作为优化算法(最小化经验误差)。

Try increasing your iteration value. 尝试增加迭代值。 This should allow your algorithm to, hopefully, converge on a value that is closer to the global minimum. 这应该允许您的算法收敛于更接近全局最小值的值。 Keep in mind you are not using l-bfgs which can come closer to converging much faster than plain gradient descent or even SGD. 请记住,你没有使用l-bfgs,它可以比普通的梯度下降甚至SGD更快地收敛。

Also try using the normal equation as another way to do Linear Regression. 也可以尝试使用正规方程作为线性回归的另一种方法。

http://eli.thegreenplace.net/2014/derivation-of-the-normal-equation-for-linear-regression/ . http://eli.thegreenplace.net/2014/derivation-of-the-normal-equation-for-linear-regression/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM