简体   繁体   中英

Multiple linear regression with python

I would like to calculate multiple linear regression with python. I found this code for simple linear regression

import numpy as np

from matplotlib.pyplot import *

x = np.array([1, 2, 3, 4, 5])

y = np.array([2, 3, 4, 4, 5])

n = np.max(x.shape)    

X = np.vstack([np.ones(n), x]).T


a = np.linalg.lstsq(X, y)[0]

So, a is the coefficient, but I don't see what [0] means ?

And how can I change the code to obtain multiple linear regressions ?

To implement multiple linear regression with python you can use any of the following options:

1) Use normal equation method (that uses matrix inverse)
2) Numpy's least-squares numpy.linalg.lstsq tool
3) Numpy's np.linalg.solve tool

For normal equations method you can use this formula: 在此输入图像描述
In above formula X is feature matrix and y is label vector.

As for Numpy's numpy.linalg.lstsq or np.linalg.solve tools you just use them out of the box.

Here I provide a link for sample data that you can use for tests: https://drive.google.com/file/d/0BzzUvSbpsTAvN1UxTkxXd2U0eVE/view

Alternative: https://www.dropbox.com/s/e3pd7fp0rfm1cfs/DB2.csv?dl=0

Data preparation code:

import pandas as pd
import numpy as np

path = 'DB2.csv'  
data = pd.read_csv(path, header=None, delimiter=";")

data.insert(0, 'Ones', 1)
cols = data.shape[1]

X = data.iloc[:,0:cols-1]  
y = data.iloc[:,cols-1:cols] 

IdentitySize = X.shape[1]
IdentityMatrix= np.zeros((IdentitySize, IdentitySize))
np.fill_diagonal(IdentityMatrix, 1)

For least squares method you use Numpy's numpy.linalg.lstsq . Here is Python code:

lamb = 1
th = np.linalg.lstsq(X.T.dot(X) + lamb * IdentityMatrix, X.T.dot(y))[0]            

Also you can use np.linalg.solve tool of numpy:

lamb = 1
XtX_lamb = X.T.dot(X) + lamb * IdentityMatrix
XtY = X.T.dot(y)
x = np.linalg.solve(XtX_lamb, XtY);

For normal equation method use:

lamb = 1
xTx = X.T.dot(X) + lamb * IdentityMatrix
XtX = np.linalg.inv(xTx)
XtX_xT = XtX.dot(X.T)
theta = XtX_xT.dot(y)

In all methods regularization is used. Here is results (theta coefficients) to see difference between these three approaches:

Normal equation:        np.linalg.lstsq         np.linalg.solve
[-27551.99918303]       [-27551.95276154]       [-27551.9991855]
[-940.27518383]         [-940.27520138]         [-940.27518383]
[-9332.54653964]        [-9332.55448263]        [-9332.54654461]
[-3149.02902071]        [-3149.03496582]        [-3149.02900965]
[-1863.25125909]        [-1863.2631435]         [-1863.25126344]
[-2779.91105618]        [-2779.92175308]        [-2779.91105347]
[-1226.60014026]        [-1226.61033117]        [-1226.60014192]
[-920.73334259]         [-920.74331432]         [-920.73334194]
[-6278.44238081]        [-6278.45496955]        [-6278.44237847]
[-2001.48544938]        [-2001.49566981]        [-2001.48545349]
[-715.79204971]         [-715.79664124]         [-715.79204921]
[ 4039.38847472]        [ 4039.38302499]        [ 4039.38847515]
[-2362.54853195]        [-2362.55280478]        [-2362.54853139]
[-12730.8039209]        [-12730.80866036]       [-12730.80392076]
[-24872.79868125]       [-24872.80203459]       [-24872.79867954]
[-3402.50791863]        [-3402.5140501]         [-3402.50793382]
[ 253.47894001]         [ 253.47177732]         [ 253.47892472]
[-5998.2045186]         [-5998.20513905]        [-5998.2045184]
[ 198.40560401]         [ 198.4049081]          [ 198.4056042]
[ 4368.97581411]        [ 4368.97175688]        [ 4368.97581426]
[-2885.68026222]        [-2885.68154407]        [-2885.68026205]
[ 1218.76602731]        [ 1218.76562838]        [ 1218.7660275]
[-1423.73583813]        [-1423.7369068]         [-1423.73583793]
[ 173.19125007]         [ 173.19086525]         [ 173.19125024]
[-3560.81709538]        [-3560.81650156]        [-3560.8170952]
[-142.68135768]         [-142.68162508]         [-142.6813575]
[-2010.89489111]        [-2010.89601322]        [-2010.89489092]
[-4463.64701238]        [-4463.64742877]        [-4463.64701219]
[ 17074.62997704]       [ 17074.62974609]       [ 17074.62997723]
[ 7917.75662561]        [ 7917.75682048]        [ 7917.75662578]
[-4234.16758492]        [-4234.16847544]        [-4234.16758474]
[-5500.10566329]        [-5500.106558]          [-5500.10566309]
[-5997.79002683]        [-5997.7904842]         [-5997.79002634]
[ 1376.42726683]        [ 1376.42629704]        [ 1376.42726705]
[ 6056.87496151]        [ 6056.87452659]        [ 6056.87496175]
[ 8149.0123667]         [ 8149.01209157]        [ 8149.01236827]
[-7273.3450484]         [-7273.34480382]        [-7273.34504827]
[-2010.61773247]        [-2010.61839251]        [-2010.61773225]
[-7917.81185096]        [-7917.81223606]        [-7917.81185084]
[ 8247.92773739]        [ 8247.92774315]        [ 8247.92773722]
[ 1267.25067823]        [ 1267.24677734]        [ 1267.25067832]
[ 2557.6208133]         [ 2557.62126916]        [ 2557.62081337]
[-5678.53744654]        [-5678.53820798]        [-5678.53744647]
[ 3406.41697822]        [ 3406.42040997]        [ 3406.41697836]
[-8371.23657044]        [-8371.2361594]         [-8371.23657035]
[ 15010.61728285]       [ 15010.61598236]       [ 15010.61728304]
[ 11006.21920273]       [ 11006.21711213]       [ 11006.21920284]
[-5930.93274062]        [-5930.93237071]        [-5930.93274048]
[-5232.84459862]        [-5232.84557665]        [-5232.84459848]
[ 3196.89304277]        [ 3196.89414431]        [ 3196.8930428]
[ 15298.53309912]       [ 15298.53496877]       [ 15298.53309919]
[ 4742.68631183]        [ 4742.6862601]         [ 4742.68631172]
[ 4423.14798495]        [ 4423.14765013]        [ 4423.14798546]
[-16153.50854089]       [-16153.51038489]       [-16153.50854123]
[-22071.50792741]       [-22071.49808389]       [-22071.50792408]
[-688.22903323]         [-688.2310229]          [-688.22904006]
[-1060.88119863]        [-1060.8829114]         [-1060.88120546]
[-101.75750066]         [-101.75776411]         [-101.75750831]
[ 4106.77311898]        [ 4106.77128502]        [ 4106.77311218]
[ 3482.99764601]        [ 3482.99518758]        [ 3482.99763924]
[-1100.42290509]        [-1100.42166312]        [-1100.4229119]
[ 20892.42685103]       [ 20892.42487476]       [ 20892.42684422]
[-5007.54075789]        [-5007.54265501]        [-5007.54076473]
[ 11111.83929421]       [ 11111.83734144]       [ 11111.83928704]
[ 9488.57342568]        [ 9488.57158677]        [ 9488.57341883]
[-2992.3070786]         [-2992.29295891]        [-2992.30708529]
[ 17810.57005982]       [ 17810.56651223]       [ 17810.57005457]
[-2154.47389712]        [-2154.47504319]        [-2154.47390285]
[-5324.34206726]        [-5324.33913623]        [-5324.34207293]
[-14981.89224345]       [-14981.8965674]        [-14981.89224973]
[-29440.90545197]       [-29440.90465897]       [-29440.90545704]
[-6925.31991443]        [-6925.32123144]        [-6925.31992383]
[ 104.98071593]         [ 104.97886085]         [ 104.98071152]
[-5184.94477582]        [-5184.9447972]         [-5184.94477792]
[ 1555.54536625]        [ 1555.54254362]        [ 1555.5453638]
[-402.62443474]         [-402.62539068]         [-402.62443718]
[ 17746.15769322]       [ 17746.15458093]       [ 17746.15769074]
[-5512.94925026]        [-5512.94980649]        [-5512.94925267]
[-2202.8589276]         [-2202.86226244]        [-2202.85893056]
[-5549.05250407]        [-5549.05416936]        [-5549.05250669]
[-1675.87329493]        [-1675.87995809]        [-1675.87329255]
[-5274.27756529]        [-5274.28093377]        [-5274.2775701]
[-5424.10246845]        [-5424.10658526]        [-5424.10247326]
[-1014.70864363]        [-1014.71145066]        [-1014.70864845]
[ 12936.59360437]       [ 12936.59168749]       [ 12936.59359954]
[ 2912.71566077]        [ 2912.71282628]        [ 2912.71565599]
[ 6489.36648506]        [ 6489.36538259]        [ 6489.36648021]
[ 12025.06991281]       [ 12025.07040848]       [ 12025.06990358]
[ 17026.57841531]       [ 17026.56827742]       [ 17026.57841044]
[ 2220.1852193]         [ 2220.18531961]        [ 2220.18521579]
[-2886.39219026]        [-2886.39015388]        [-2886.39219394]
[-18393.24573629]       [-18393.25888463]       [-18393.24573872]
[-17591.33051471]       [-17591.32838012]       [-17591.33051834]
[-3947.18545848]        [-3947.17487999]        [-3947.18546459]
[ 7707.05472816]        [ 7707.05577227]        [ 7707.0547217]
[ 4280.72039079]        [ 4280.72338194]        [ 4280.72038435]
[-3137.48835901]        [-3137.48480197]        [-3137.48836531]
[ 6693.47303443]        [ 6693.46528167]        [ 6693.47302811]
[-13936.14265517]       [-13936.14329336]       [-13936.14267094]
[ 2684.29594641]        [ 2684.29859601]        [ 2684.29594183]
[-2193.61036078]        [-2193.63086307]        [-2193.610366]
[-10139.10424848]       [-10139.11905454]       [-10139.10426049]
[ 4475.11569903]        [ 4475.12288711]        [ 4475.11569421]
[-3037.71857269]        [-3037.72118246]        [-3037.71857265]
[-5538.71349798]        [-5538.71654224]        [-5538.71349794]
[ 8008.38521357]        [ 8008.39092739]        [ 8008.38521361]
[-1433.43859633]        [-1433.44181824]        [-1433.43859629]
[ 4212.47144667]        [ 4212.47368097]        [ 4212.47144686]
[ 19688.24263706]       [ 19688.2451694]        [ 19688.2426368]
[ 104.13434091]         [ 104.13434349]         [ 104.13434091]
[-654.02451175]         [-654.02493111]         [-654.02451174]
[-2522.8642551]         [-2522.88694451]        [-2522.86424254]
[-5011.20385919]        [-5011.22742915]        [-5011.20384655]
[-13285.64644021]       [-13285.66951459]       [-13285.64642763]
[-4254.86406891]        [-4254.88695873]        [-4254.86405637]
[-2477.42063206]        [-2477.43501057]        [-2477.42061727]
[ 0.]                   [  1.23691279e-10]      [ 0.]
[-92.79470071]          [-92.79467095]          [-92.79470071]
[ 2383.66211583]        [ 2383.66209637]        [ 2383.66211583]
[-10725.22892185]       [-10725.22889937]       [-10725.22892185]
[ 234.77560283]         [ 234.77560254]         [ 234.77560283]
[ 4739.22119578]        [ 4739.22121432]        [ 4739.22119578]
[ 43640.05854156]       [ 43640.05848841]       [ 43640.05854157]
[ 2592.3866707]         [ 2592.38671547]        [ 2592.3866707]
[-25130.02819215]       [-25130.05501178]       [-25130.02819515]
[ 4966.82173096]        [ 4966.7946407]         [ 4966.82172795]
[ 14232.97930665]       [ 14232.9529959]        [ 14232.97930363]
[-21621.77202422]       [-21621.79840459]       [-21621.7720272]
[ 9917.80960029]        [ 9917.80960571]        [ 9917.80960029]
[ 1355.79191536]        [ 1355.79198092]        [ 1355.79191536]
[-27218.44185748]       [-27218.46880642]       [-27218.44185719]
[-27218.04184348]       [-27218.06875423]       [-27218.04184318]
[ 23482.80743869]       [ 23482.78043029]       [ 23482.80743898]
[ 3401.67707434]        [ 3401.65134677]        [ 3401.67707463]
[ 3030.36383274]        [ 3030.36384909]        [ 3030.36383274]
[-30590.61847724]       [-30590.63933424]       [-30590.61847706]
[-28818.3942685]        [-28818.41520495]       [-28818.39426833]
[-25115.73726772]       [-25115.7580278]        [-25115.73726753]
[ 77174.61695995]       [ 77174.59548773]       [ 77174.61696016]
[-20201.86613672]       [-20201.88871113]       [-20201.86613657]
[ 51908.53292209]       [ 51908.53446495]       [ 51908.53292207]
[ 7710.71327865]        [ 7710.71324194]        [ 7710.71327865]
[-16206.9785119]        [-16206.97851993]       [-16206.9785119]

As you can see normal equation, least squares and np.linalg.solve tool methods give to some extent different results.

to extend it to Multiple Linear Regression all you have to do is to create a multi dimensional x instead of a one dimension x.

ie,

x = np.array([[1, 2, 3,4,5], [4, 5, 6,7,8]], np.int32)

http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html

and with respect to a[0] that is called the intercept in a linear regression, ie,

y = a + bx + error, a[0] = a, a[1] = b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM