使用ScikitLearn进行多元线性回归，不同的方法给出不同的答案

Question

This is probably as equally valid on stats exchange as here (could be the stats or python that i'm not sure about. 这可能与此处的统计信息交换同样有效（可能是我不确定的统计信息或python。

Suppose I have two independent variables X,Y that explain some of the variance of Z . 假设我有两个自变量X,Y来解释Z一些方差。

    from sklearn.linear_model import LinearRegression
    import numpy as np
    from scipy.stats import pearsonr,linregress

    Z = np.array([1,3,5,6,7,8,9,7,10,9])

    X  = np.array([2,5,3,1,6,4,7,8,6,7])
    Y  = np.array([3,2,6,4,6,1,2,5,6,10])

I want to regress out the variability in X and Y from Z. There's two approaches that I know of: 我想从Z回归X和Y的可变性。我知道两种方法：

Regress out X from Z first (form a linear regression of X,Z, find the residual, then repeat for Y). 首先从Z回归X（形成X，Z的线性回归，找到残差，然后对Y重复）。 Such that: 这样：

    regr = linregress(X,Z) 
    resi_1 = NAO - (X*regr[0])+regr[1]  #residual = y-mx+c

    regr = linregress(Y,resi_1)
    resi_2 = resi_1 - (Y*regr[0])+regr[1] #residual = y-mx+c

Where regr_2 is the remainder of Z where X and Y have been sequentially regressed out. 其中regr_2是Z的其余部分，其中X和Y依次回归。

The alternative is to create a multiple linear regression model for X and Y predicting Z: 另一种方法是为X和Y创建一个预测Z的多元线性回归模型：

regr = LinearRegression()
Model = regr.fit(np.array((X,Y)).swapaxes(0,1),Z)

pred = Model.predict(np.array((X,Y)).swapaxes(0,1))
resi_3 = Z - pred

The residual from the first sequential approach resi_2 and the multiple linear regression resi_3 are very similar (correlation=0.97) but not equivalent. 第一个顺序方法resi_2和多元线性回归resi_3非常相似（相关性= 0.97），但不相等。 The two residuals are plotted below: 这两个残差如下图所示：

Any thoughts great (not a statistician so could be my understanding vs a python problem!). 任何伟大的想法（不是统计学家，所以我的理解可能是python问题！）。 Note if for the first part I regress out Y first, then X, I get different residuals. 请注意，如果在第一部分中我先回归Y，然后再回归X，则得到不同的残差。

Answer 1

Here is an example 3D graphical surface fitter using your data and scipy's curve_fit() routine with scatter, surface, and contour plots. 这是使用数据和scipy的curve_fit（）例程以及散点图，曲面图和轮廓图的示例3D图形曲面拟合器。 You should be able to click-drag the 3D plots to rotate them in 3-space and see that the data does not appear to lie on any sort of smooth surface, so the flat plane model used here "z = (a *x) + (b * y) + c" is pretty much no better or worse than any other model for this data. 您应该能够单击并拖动3D图以在3维空间中旋转它们，并看到数据似乎不位于任何类型的光滑表面上，因此此处使用的平面模型“ z =（a * x） +（b * y）+ c”对于此数据而言，几乎没有任何其他模型更好或更差。

fitted prameters [ 0.65963199  0.18537117  2.43363301]
RMSE: 2.11487214206
R-squared: 0.383078044516

import numpy, scipy, scipy.optimize
import matplotlib
from mpl_toolkits.mplot3d import  Axes3D
from matplotlib import cm # to colormap 3D surfaces from blue to red
import matplotlib.pyplot as plt

graphWidth = 800 # units are pixels
graphHeight = 600 # units are pixels

# 3D contour plot lines
numberOfContourLines = 16


def SurfacePlot(func, data, fittedParameters):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)

    matplotlib.pyplot.grid(True)
    axes = Axes3D(f)

    x_data = data[0]
    y_data = data[1]
    z_data = data[2]

    xModel = numpy.linspace(min(x_data), max(x_data), 20)
    yModel = numpy.linspace(min(y_data), max(y_data), 20)
    X, Y = numpy.meshgrid(xModel, yModel)

    Z = func(numpy.array([X, Y]), *fittedParameters)

    axes.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm, linewidth=1, antialiased=True)

    axes.scatter(x_data, y_data, z_data) # show data along with plotted surface

    axes.set_title('Surface Plot (click-drag with mouse)') # add a title for surface plot
    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label
    axes.set_zlabel('Z Data') # Z axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot or else there can be memory and process problems


def ContourPlot(func, data, fittedParameters):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    x_data = data[0]
    y_data = data[1]
    z_data = data[2]

    xModel = numpy.linspace(min(x_data), max(x_data), 20)
    yModel = numpy.linspace(min(y_data), max(y_data), 20)
    X, Y = numpy.meshgrid(xModel, yModel)

    Z = func(numpy.array([X, Y]), *fittedParameters)

    axes.plot(x_data, y_data, 'o')

    axes.set_title('Contour Plot') # add a title for contour plot
    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    CS = matplotlib.pyplot.contour(X, Y, Z, numberOfContourLines, colors='k')
    matplotlib.pyplot.clabel(CS, inline=1, fontsize=10) # labels for contours

    plt.show()
    plt.close('all') # clean up after using pyplot or else there can be memory and process problems


def ScatterPlot(data):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)

    matplotlib.pyplot.grid(True)
    axes = Axes3D(f)
    x_data = data[0]
    y_data = data[1]
    z_data = data[2]

    axes.scatter(x_data, y_data, z_data)

    axes.set_title('Scatter Plot (click-drag with mouse)')
    axes.set_xlabel('X Data')
    axes.set_ylabel('Y Data')
    axes.set_zlabel('Z Data')

    plt.show()
    plt.close('all') # clean up after using pyplot or else there can be memory and process problems


def func(data, a, b, c): # example flat surface
    x = data[0]
    y = data[1]
    return (a * x) + (b * y) + c


if __name__ == "__main__":

    xData = numpy.array([2.0, 5.0, 3.0, 1.0, 6.0, 4.0, 7.0, 8.0, 6.0, 7.0])
    yData = numpy.array([3.0, 2.0, 6.0, 4.0, 6.0, 1.0, 2.0, 5.0, 6.0, 10.0])
    zData = numpy.array([1.0, 3.0, 5.0, 6.0, 7.0, 8.0, 9.0, 7.0, 10.0, 9.0])

    data = [xData, yData, zData]

    initialParameters = [1.0, 1.0, 1.0] # these are the same as scipy default values in this example

    # here a non-linear surface fit is made with scipy's curve_fit()
    fittedParameters, pcov = scipy.optimize.curve_fit(func, [xData, yData], zData, p0 = initialParameters)

    ScatterPlot(data)
    SurfacePlot(func, data, fittedParameters)
    ContourPlot(func, data, fittedParameters)

    print('fitted prameters', fittedParameters)

    modelPredictions = func(data, *fittedParameters) 

    absError = modelPredictions - zData

    SE = numpy.square(absError) # squared errors
    MSE = numpy.mean(SE) # mean squared errors
    RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
    Rsquared = 1.0 - (numpy.var(absError) / numpy.var(zData))
    print('RMSE:', RMSE)
    print('R-squared:', Rsquared)

使用ScikitLearn进行多元线性回归，不同的方法给出不同的答案

问题描述

1 个解决方案

解决方案1
0 2019-08-02 15:27:43

使用ScikitLearn进行多元线性回归，不同的方法给出不同的答案

问题描述

1 个解决方案

解决方案1 0 2019-08-02 15:27:43

解决方案1
0 2019-08-02 15:27:43