简体   繁体   中英

Logistic Regression Model & Multicolinearity of Categorical Variables in R

I have a training dataset that has 3233 rows and 62 columns. The independent variable is Happy (train$Happy), which is a binary variable. The other 61 columns are categorical independent variables.

I've created a logistic regression model as follows:

logModel <- glm(Happy ~ ., data = train, family = binary)

However, I want to reduce the number of independent variables that go into the model, perhaps down to 20 or so. I would like to start by getting rid of colinear categorical variables.

Can someone shed some light on how to determine which categorical variables are colinear and what threshold that I should use when removing a variable from a model?

Thank you!

if your variables were categorical then the obvious solution would be penalized logistic regression (Lasso) in R it is implemented in glmnet.

With categorical variables the problem is much more difficult.

I was in a similar situation and I used the importance plot from the package random forest in order to reduce the number of variables. This would not help you to find collinearity but only to rank the variables by importance.

You have only 60 variable and maybe you have a knowledge of the field so you can try to add to you model some variables that makes sense to you (like z=x1-x3 if you think that the value x1-x3 is important.) and then rank them according to a random forest model

You could use Cramer's V, or the related Phi or contingency coefficient (see a great paper at http://www.harding.edu/sbreezeel/460%20files/statbook/chapter15.pdf ), to measure colinearity among categorical variables. If two or more categorical variables have a Cramer's V value close to 1, it means they're highly "correlated" and you may not need to keep all of them in your logistic regression model.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM