[英]Is there a way to get the x-axis and y-axis values of my seaborn plot?
I used seaborn library to get fit a regression line for my data.我使用 seaborn 库来拟合我的数据的回归线。 Then I also plotted the residual plot.
然后我还绘制了残差 plot。 I now need to see the histogram distribution of my residuals?
我现在需要查看残差的直方图分布吗? How can I do that as I don't have the values plotted in the graph.
我怎么能做到这一点,因为我没有在图表中绘制的值。
Here is my code:这是我的代码:
fig,axes = plt.subplots(1,3,figsize=(15,5))
sns.regplot(x = 'Radio',y='Sales',data=df_advertising,ax = axes[0])
sns.residplot(x = 'Radio',y='Sales',data=df_advertising,ax = axes[1])
How can I get the values of my residual plot so that I can plot the corresponding histogram to see the distribution.如何获取我的残差 plot 的值,以便我可以 plot 对应的直方图查看分布。
Thanks, any help will be appreciated.谢谢,任何帮助将不胜感激。 I'm just a beginner.
我只是一个初学者。
It's not quite possible to get back the fit or the values (see also this question ).恢复适合度或值是不太可能的(另请参阅此问题)。 It also makes sense that you know what you fit, and then plot the residuals.
你知道你适合什么也是有道理的,然后 plot 残差。 Below I use an example dataset:
下面我使用一个示例数据集:
import matplotlib. pyplot as plt
import seaborn as sns
import numpy as np
import statsmodels.api as sm
df_advertising = pd.DataFrame({'Radio':np.random.randint(1,10,100)})
df_advertising['Sales'] = 3*df_advertising['Radio'] + np.random.normal(10,4,100)
We can plot it using seaborn:我们可以使用 plot 它使用 seaborn:
fig,axes = plt.subplots(1,2,figsize=(10,5))
sns.regplot(x = 'Radio',y='Sales',data=df_advertising,ax = axes[0])
sns.residplot(x = 'Radio',y='Sales',data=df_advertising,ax = axes[1])
seaborn uses statsmodels, so lets use that to fit and get the predictions: seaborn 使用 statsmodels,所以让我们使用它来拟合并获得预测:
mod = sm.OLS(df_advertising['Sales'],sm.add_constant(df_advertising['Radio']))
res = mod.fit()
test = df_advertising[['Radio']].drop_duplicates().sort_values('Radio')
predictions = res.get_prediction(sm.add_constant(test))
predictions = pd.concat([test,predictions.summary_frame(alpha=0.05)],axis=1)
predictions.head()
Radio mean mean_se mean_ci_lower mean_ci_upper obs_ci_lower obs_ci_upper
13 1 11.132902 0.700578 9.742628 12.523175 3.862061 18.403742
6 2 14.480520 0.582916 13.323742 15.637298 7.250693 21.710347
2 3 17.828139 0.478925 16.877728 18.778550 10.628448 25.027829
4 4 21.175757 0.399429 20.383104 21.968411 13.995189 28.356326
10 5 24.523376 0.360990 23.807002 25.239750 17.350827 31.695924
In the above, I create test to not duplicate the data points (since mine was counts).在上面,我创建了不重复数据点的测试(因为我的是计数)。 Now we have everything to plot.
现在我们拥有 plot 的所有内容。 The residuals are simply under
resid
of the statsmodels object:残差只是在
resid
object 的残差之下:
fig,axes = plt.subplots(1,3,figsize=(15,5))
sns.scatterplot(x='Radio',y='Sales',ax=axes[0],data=df_advertising)
axes[0].plot(predictions['Radio'], predictions['mean'], lw=2)
axes[0].fill_between(x=predictions['Radio'],
y1=predictions['mean_ci_lower'],y2=predictions['mean_ci_upper'],
facecolor='blue', alpha=0.2)
sns.scatterplot(x='Radio',y='Sales',ax=axes[1],
data=pd.DataFrame({'Radio':df_advertising['Radio'],
'Sales':res.resid})
)
axes[1].axhline(0, ls='--',color="k")
sns.distplot(res.resid,ax=axes[2],bins=20)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.