简体   繁体   English

计算 R 中序数逻辑回归和多重共线性的 VIF

[英]Calculating VIF for ordinal logistic regression & multicollinearity in R

I am running an ordinal regression model.我正在运行序数回归模型。 I have 8 explanatory variables, 4 of them categorical ( '0' or '1' ) , 4 of them continuous.我有 8 个解释变量,其中 4 个是分类变量( '0''1' ),其中 4 个是连续的。 Beforehand I want to be sure there's no multicollinearity, so I use the variance inflation factor ( vif function from the car package) :事先我想确保没有多重共线性,所以我使用方差膨胀因子(来自汽车包的vif函数):

mod1<-polr(Y ~ X1+X2+X3+X4+X5+X6+X7+X8, Hess = T, data=df)
vif(mod1)

but I get a VIF value of 125 for one of the variables, as well as the following warning :但我得到一个变量的 VIF 值为 125,以及以下警告:

Warning message: In vif.default(mod1) : No intercept: vifs may not be sensible.警告信息:在 vif.default(mod1) 中:没有拦截:vifs 可能不合理。

However, when I convert my dependent variable to numeric (instead of a factor), and do the same thing with a linear model :但是,当我将因变量转换为数字(而不是因子)并使用线性模型执行相同操作时:

mod2<-lm(Y ~ X1+X2+X3+X4+X5+X6+X7+X8, data=df)
vif(mod2)

This time all the VIF values are below 3, suggesting that there's no multicollinearity.这次所有的 VIF 值都低于 3,表明不存在多重共线性。

I am confused about the vif function.我对vif功能感到困惑。 How can it return VIFs > 100 for one model and low VIFs for another ?它如何为一个模型返回大于 100 的 VIF 而为另一个模型返回低 VIF? Should I stick with the second result and still do an ordinal model anyway ?我应该坚持使用第二个结果并且仍然做一个序数模型吗?

The vif() function uses determinants of the correlation matrix of the parameters (and subsets thereof) to calculate the VIF. vif()函数使用参数(及其子集)的相关矩阵的行列式来计算 VIF。 In the linear model, this includes just the regression coefficients (excluding the intercept).在线性模型中,这仅包括回归系数(不包括截距)。 The vif() function wasn't intended to be used with ordered logit models. vif()函数不打算与有序 logit 模型一起使用。 So, when it finds the variance-covariance matrix of the parameters, it includes the threshold parameters (ie, intercepts), which would normally be excluded by the function in a linear model.因此,当它找到参数的方差-协方差矩阵时,它包括阈值参数(即截距),这些参数通常会被线性模型中的函数排除。 This is why you get the warning you get - it doesn't know to look for threshold parameters and remove them.这就是您收到警告的原因 - 它不知道查找阈值参数并将其删除。 Since the VIF is really a function of inter-correlations in the design matrix (which doesn't depend on the dependent variable or the non-linear mapping from the linear predictor into the space of the response variable [ie, the link function in a glm]), you should get the right answer with your second solution above, using lm() with a numeric version of your dependent variable.由于 VIF 实际上是设计矩阵中相互关联的函数(它不依赖于因变量或从线性预测变量到响应变量空间的非线性映射 [即glm]),您应该通过上面的第二个解决方案得到正确的答案,使用lm()和您的因变量的数字版本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM