简体   繁体   English

如何解释数据的线性回归假设

[英]How to interpret Linear regression assumptions of the data

I have a data set with 3719 observation with 9 features.我有一个包含 3719 个观察值和 9 个特征的数据集。 In that I have performed multiple linear regression with selected features.因为我已经使用选定的特征进行了多元线性回归。 I got the diagnostic plots like this:我得到了这样的诊断图:

这个

I have understood by plots that there is no linearity between dependent and independent variables.Residual plots giving non linear trend.我通过绘图了解到因变量和自变量之间没有线性关系。残差图给出非线性趋势。 But from Normal QQ plot it follows the normal distribution.但从正态 QQ 图来看,它遵循正态分布。 I did not get the what is mean by residual vs leverage plots?我没有明白残差与杠杆图的含义?

Am I understanding correctly?我理解正确吗? How to interpret those plots.如何解释这些情节。

Your residuals are exhibiting heteroscedasticity (top-left), meaning that the variability in your outcome increases with the values of the outcome.您的残差表现出异方差性(左上角),这意味着结果的可变性随着结果值的增加而增加。 For example, income vs expenditure: wealthier people have more variability in the price of food that they buy (they sometimes buy cheap food and sometimes buy expensive food) while poorer people tend to buy only cheap food.例如,收入 vs 支出:较富裕的人购买的食物价格波动较大(他们有时会购买便宜的食物,有时会购买昂贵的食物),而较贫穷的人往往只购买便宜的食物。

The QQ-plot (bottom-left) assesses the normality of residuals assumption and I see nothing to suggest a serious violation. QQ 图(左下)评估残差假设的正态性,我看不出任何严重违规的迹象。 The slight departure at the top right is not as serious as your heteroscedasticity problem though.不过,右上角的轻微偏离并不像您的异方差问题那么严重。

The residuals vs leverage plot (bottom-right) indicates points that may have a large influence on your results based on Cook's distance.残差与杠杆图(右下角)表示可能对基于 Cook 距离的结果有很大影响的点。 This can help identify outliers in your data, and you could consider omitting these before running another model (a rather subjective assessment).这有助于识别数据中的异常值,您可以考虑在运行另一个模型之前忽略这些异常值(一种相当主观的评估)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM