[英]Linear Regression: How to find the distance between the points and the prediction line?
I'm looking to find the distance between the points and the prediction line. 我想找到点和预测线之间的距离。 Ideally I would like the results to be displayed in a new column which contains the distance, called 'Distance'.
理想情况下,我希望结果显示在包含距离的新列中,称为“距离”。
My Imports: 我的进口:
import os.path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.linear_model import LinearRegression
%matplotlib inline
Sample of my data: 我的数据样本:
idx Exam Results Hours Studied
0 93 8.232795
1 94 7.879095
2 92 6.972698
3 88 6.854017
4 91 6.043066
5 87 5.510013
6 89 5.509297
My code so far: 我的代码到目前为止:
x = df['Hours Studied'].values[:,np.newaxis]
y = df['Exam Results'].values
model = LinearRegression()
model.fit(x, y)
plt.scatter(x, y,color='r')
plt.plot(x, model.predict(x),color='k')
plt.show()
Any help would be greatly appreciated. 任何帮助将不胜感激。 Thanks
谢谢
You simply need to assign the difference between y
and model.predict(x)
to a new column (or take absolute value if you just want the magnitude if the difference): 您只需要将
y
和model.predict(x)
之间的差异分配给一个新列(如果只是想要差异,则采用绝对值):
#df["Distance"] = abs(y - model.predict(x)) # if you only want magnitude
df["Distance"] = y - model.predict(x)
print(df)
# Exam Results Hours Studied Distance
#0 93 8.232795 -0.478739
#1 94 7.879095 1.198511
#2 92 6.972698 0.934043
#3 88 6.854017 -2.838712
#4 91 6.043066 1.714063
#5 87 5.510013 -1.265269
#6 89 5.509297 0.736102
This is because your model predicts a y
(dependent variable) for each independent variable ( x
). 这是因为您的模型为每个自变量(
x
)预测y
(因变量)。 The x
coordinates are the same, so the difference in y
is the value you want. x
坐标是相同的,因此y
的差异是您想要的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.