[英]Negative confidence interval in linear regression despite all positive values
I am getting a negative confidence interval for a linear regression plot even though all data points are positive.即使所有数据点都是正数,我也得到了线性回归图的负置信区间。 Why is this happening?
为什么会这样? I believe this negative confidence interval will also affect my R^2 score?
我相信这个负置信区间也会影响我的 R^2 分数吗?
Code used is:使用的代码是:
sns.regplot(x = 'Consumer Confidence Index_1', y = 'Sales (ALV
sources)', data = df_mx2)
plt.show()
One of the foundational assumptions for a linear regression is that the data is normally distributed about the line.线性回归的基本假设之一是数据在直线上呈正态分布。 In your case you have data on the right side and the left side with a big gap in the middle.
在您的情况下,您的右侧和左侧都有数据,中间有很大的差距。 As such, you should double check that a linear regression is appropriate for your analysis.
因此,您应该仔细检查线性回归是否适合您的分析。
That being said, rest easy, the negative confidence interval will NOT effect your R² value.话虽如此,请放心,负置信区间不会影响您的 R² 值。
The reason for the negative confidence interval has to do with the sparsity of data with x<42.负置信区间的原因与 x<42 的数据稀疏有关。 If the three points on the right side were removed, the regression would have a positive slope intersecting the x axis around x=42.
如果删除右侧的三个点,则回归将在 x=42 附近与 x 轴相交具有正斜率。 If that line were extended to x=30 or so it would be very negative.
如果这条线被扩展到 x=30 左右,那将是非常消极的。 As such the data suggests that to hit the confidence threshold you have set, the confidence interval must be very large to include data that potentially lines up with the steeper regression line.
因此,数据表明要达到您设置的置信阈值,置信区间必须非常大,以包含可能与更陡峭的回归线对齐的数据。
This can be interpreted as the data provides very little in the way of predictive ability below x=42.这可以解释为数据在 x=42 以下提供的预测能力非常少。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.