简体   繁体   中英

Interpreting how R codifies dummy response variable in Logistic Regression

I am a newbie, who is having trouble in interpreting the output of my logistic regression. My response variable has two values - “multiplex” and “subterraneus”. When used the factor() function on “microtus.train” data frame, I get “mutiplex and subterraneus” in that order. After I fitted the model, and predict the response, I am having trouble in understanding what does the probability mean. Do these probabilities mean probability of an observation being “subterraneus”? When I used “contrasts(microtus.train$Group)” statement, I got the table below.

> contrasts(microtus.train$Group)
             subterraneus
multiplex               0
subterraneus            1

Based on this table, I interpret that the model is trying to predict probabilities of “subterraneus” (not the probabilities of “multiplex”) because “1” is dummy coded for “subterraneus”. Is my assumption correct?

My code is given below and I appreciate your help in advance.

library(Flury)
data(microtus, package = "Flury")

str(microtus)
summary(microtus)

# Creating training & test data frames
microtus.train <- subset(microtus, 
                     microtus$Group %in% c("multiplex", "subterraneus"), 
                     select = c("Group", "M1Left", "M2Left", "M3Left", 
                                "Foramen", "Pbone","Length", "Height",
                                "Rostrum") )

# Drop 3rd factor level
microtus.train$Group = droplevels(microtus.train$Group)
factor(microtus.train$Group)


nullModel.GLM <- glm(Group ~ 1, data = microtus.train, 
                     family = binomial())
fullModel.GLM <- glm(Group ~ ., data = microtus.train, 
                     family = binomial())
summary(nullModel.GLM)
summary(fullModel.GLM)

stepFwd.GLM <- step(nullModel.GLM, scope = list(upper = fullModel.GLM), 
                    direction = 'forward', k = 2)
stepFwd.GLM.fitResults <- predict(stepFwd.GLM, type = 'response')
stepFwd.GLM.fitResults

contrasts(microtus.train$Group)

It's not the contrasts that matter, but the order of the factor levels (contrasts specify how the predictor variables are encoded as dummy variables). From ?glm :

For 'binomial' and 'quasibinomial' families the response can also be specified as a 'factor' (when the first level denotes failure and all others success)

Since R defines the levels of factors in alphabetical order by default, "multiplex" is (probably) the first level and "subterraneus" is the second, hence the logistic regression is predicting the probability of "subterraneus". You can check this with levels(microtus$Group) , and adjust it if necessary by using factor() with the levels argument set explicitly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM