簡體   English   中英

matplotlib / numpy 的線性回歸

[英]Linear regression with matplotlib / numpy

我正在嘗試對我生成的散點圖 plot 生成線性回歸,但是我的數據是列表格式,我能找到的所有使用polyfit的示例都需要使用arange arange不接受列表。 我到處搜索如何將列表轉換為數組,但似乎沒有什么清楚的。 我錯過了什么嗎?

接下來,如何最好地使用我的整數列表作為polyfit的輸入?

這是我正在關注的 polyfit 示例:

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(data)
y = np.arange(data)

m, b = np.polyfit(x, y, 1)

plt.plot(x, y, 'yo', x, m*x+b, '--k')
plt.show()

arange生成列表(嗯,numpy 數組); 鍵入help(np.arange)以獲取詳細信息。 您無需在現有列表中調用它。

>>> x = [1,2,3,4]
>>> y = [3,5,7,9] 
>>> 
>>> m,b = np.polyfit(x, y, 1)
>>> m
2.0000000000000009
>>> b
0.99999999999999833

我應該補充一點,我傾向於在這里使用poly1d而不是寫出“m*x+b”和更高階的等價物,所以我的代碼版本看起來像這樣:

import numpy as np
import matplotlib.pyplot as plt

x = [1,2,3,4]
y = [3,5,7,10] # 10, not 9, so the fit isn't perfect

coef = np.polyfit(x,y,1)
poly1d_fn = np.poly1d(coef) 
# poly1d_fn is now a function which takes in x and returns an estimate for y

plt.plot(x,y, 'yo', x, poly1d_fn(x), '--k')
plt.xlim(0, 5)
plt.ylim(0, 12)

在此處輸入圖像描述

這段代碼:

from scipy.stats import linregress

linregress(x,y) #x and y are arrays or lists.

給出一個清單,內容如下:

坡度:浮動
回歸線的斜率
攔截:浮動
回歸線的截距
r 值:浮點數
相關系數
p 值:浮點數
假設檢驗的雙邊 p 值,其 null 假設是斜率為零
標准錯誤:浮動
估計的標准誤

資源

使用statsmodels.api.OLS獲取擬合/系數/殘差的詳細分類:

import statsmodels.api as sm

df = sm.datasets.get_rdataset('Duncan', 'carData').data
y = df['income']
x = df['education']

model = sm.OLS(y, sm.add_constant(x))
results = model.fit()

print(results.params)
# const        10.603498 <- intercept
# education     0.594859 <- slope
# dtype: float64

print(results.summary())
#                             OLS Regression Results                            
# ==============================================================================
# Dep. Variable:                 income   R-squared:                       0.525
# Model:                            OLS   Adj. R-squared:                  0.514
# Method:                 Least Squares   F-statistic:                     47.51
# Date:                Thu, 28 Apr 2022   Prob (F-statistic):           1.84e-08
# Time:                        00:02:43   Log-Likelihood:                -190.42
# No. Observations:                  45   AIC:                             384.8
# Df Residuals:                      43   BIC:                             388.5
# Df Model:                           1                                         
# Covariance Type:            nonrobust                                         
# ==============================================================================
#                  coef    std err          t      P>|t|      [0.025      0.975]
# ------------------------------------------------------------------------------
# const         10.6035      5.198      2.040      0.048       0.120      21.087
# education      0.5949      0.086      6.893      0.000       0.421       0.769
# ==============================================================================
# Omnibus:                        9.841   Durbin-Watson:                   1.736
# Prob(Omnibus):                  0.007   Jarque-Bera (JB):               10.609
# Skew:                           0.776   Prob(JB):                      0.00497
# Kurtosis:                       4.802   Cond. No.                         123.
# ==============================================================================

matplotlib 3.5.0 新增功能

對於 plot 最佳擬合線,只需將斜率m和截距b傳遞到新的plt.axline

import matplotlib.pyplot as plt

# extract intercept b and slope m
b, m = results.params

# plot y = m*x + b
plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')

請注意,可以從任何常見的回歸方法中輕松提取斜率m和截距b

  • numpy.polyfit

     import numpy as np m, b = np.polyfit(x, y, deg=1) plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
  • scipy.stats.linregress

     from scipy import stats m, b, *_ = stats.linregress(x, y) plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
  • statsmodels.api.OLS

     import statsmodels.api as sm b, m = sm.OLS(y, sm.add_constant(x)).fit().params plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
  • sklearn.linear_model.LinearRegression

     from sklearn.linear_model import LinearRegression reg = LinearRegression().fit(x[:, None], y) b = reg.intercept_ m = reg.coef_[0] plt.axline(xy1=(0, b), slope=m, label=f'$y = {m:.1f}x {b:+.1f}$')
import numpy as np
import matplotlib.pyplot as plt 
from scipy import stats

x = np.array([1.5,2,2.5,3,3.5,4,4.5,5,5.5,6])
y = np.array([10.35,12.3,13,14.0,16,17,18.2,20,20.7,22.5])
gradient, intercept, r_value, p_value, std_err = stats.linregress(x,y)
mn=np.min(x)
mx=np.max(x)
x1=np.linspace(mn,mx,500)
y1=gradient*x1+intercept
plt.plot(x,y,'ob')
plt.plot(x1,y1,'-r')
plt.show()

用這個..

另一個快速而骯臟的答案是,您可以使用以下方法將列表轉換為數組:

import numpy as np
arr = np.asarray(listname)
from pylab import * 

import numpy as np
x1 = arange(data) #for example this is a list
y1 = arange(data) #for example this is a list 
x=np.array(x) #this will convert a list in to an array
y=np.array(y)
m,b = polyfit(x, y, 1) 

plot(x, y, 'yo', x, m*x+b, '--k') 
show()

線性回歸是開始人工智能的一個很好的例子

這是使用 Python 的多元線性回歸機器學習算法的一個很好的例子:

##### Predicting House Prices Using Multiple Linear Regression - @Y_T_Akademi
    
#### In this project we are gonna see how machine learning algorithms help us predict house prices. Linear Regression is a model of predicting new future data by using the existing correlation between the old data. Here, machine learning helps us identify this relationship between feature data and output, so we can predict future values.

import pandas as pd

##### we use sklearn library in many machine learning calculations..

from sklearn import linear_model

##### we import out dataset: housepricesdataset.csv

df = pd.read_csv("housepricesdataset.csv",sep = ";")

##### The following is our feature set:
##### The following is the output(result) data:
##### we define a linear regression model here: 

reg = linear_model.LinearRegression()
reg.fit(df[['area', 'roomcount', 'buildingage']], df['price'])

# Since our model is ready, we can make predictions now:
# lets predict a house with 230 square meters, 4 rooms and 10 years old building..

reg.predict([[230,4,10]])

# Now lets predict a house with 230 square meters, 6 rooms and 0 years old building - its new building..
reg.predict([[230,6,0]])

# Now lets predict a house with 355 square meters, 3 rooms and 20 years old building 
reg.predict([[355,3,20]])

# You can make as many prediction as you want.. 
reg.predict([[230,4,10], [230,6,0], [355,3,20], [275, 5, 17]])

我的數據集如下:

在此處輸入圖像描述

George 的回答與 matplotlib 的axline非常吻合,它繪制了一條無限線。

from scipy.stats import linregress
import matplotlib.pyplot as plt

reg = linregress(x, y)
plt.axline(xy1=(0, reg.intercept), slope=reg.slope, linestyle="--", color="k")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM