[英]Plotting a simple linear regression model goes wrong
我想创建一个线性回归 model 显示 BMI 和疾病风险之间的正相关(基线后一年的疾病定量测量)。
数据集与 sklearn 数据集相同—— import sklearn.datasets.load_diabetes
这是 URL ( https://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt )
我已经使用 read_csv(args) 导入了整个表并将其称为“数据”
df = DataFrame({'BMI': data['BMI'], 'Target': data['Y']}).sort_values('BMI')
df.plot.scatter('BMI', 'Target')
model = LinearRegression(fit_intercept=True)
model.fit(data[['BMI']], data['Y'])
x_test = np.linspace(data['BMI'].min(), data['BMI'].max())
y_pred = model.predict(x_test[:, np.newaxis])
df.plot(x_test, y_pred, linestyle=":", color="red")
当我尝试这个时,它给了我一个我不明白的大错误信息,为什么会发生这种情况?
我想你想要的是:
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
from matplotlib import pyplot as plt
[...]
df = pd.DataFrame({'BMI': data['BMI'], 'Target': data['Y']}).sort_values('BMI')
model = LinearRegression(fit_intercept=True)
model.fit(data[['BMI']], data['Y'])
x_test = np.linspace(data['BMI'].min(), data['BMI'].max())
y_pred = model.predict(x_test[:, np.newaxis])
plt.scatter(df['BMI'].values, df['Target'].values)
plt.plot(x_test, y_pred, linestyle="-", color="red")
plt.show()
The solution you had with df.plot(x, y)
is giving you the error because this plot function of the pandas dataframe only works on the dataframe it is called on. It's no general plot function like the pyplot.plot(x, y)
plot function.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.