简体   繁体   English

解释R如何在Logistic回归中编写虚拟响应变量

[英]Interpreting how R codifies dummy response variable in Logistic Regression

I am a newbie, who is having trouble in interpreting the output of my logistic regression. 我是新手,在解释逻辑回归的输出时遇到麻烦。 My response variable has two values - “multiplex” and “subterraneus”. 我的响应变量有两个值-“ Multiplex”和“ subterraneus”。 When used the factor() function on “microtus.train” data frame, I get “mutiplex and subterraneus” in that order. 当在“ microtus.train”数据帧上使用factor()函数时,我将依次获得“多重和地下”。 After I fitted the model, and predict the response, I am having trouble in understanding what does the probability mean. 在拟合模型并预测响应后,我很难理解概率的含义。 Do these probabilities mean probability of an observation being “subterraneus”? 这些概率是否表示观测结果为“地下”的概率? When I used “contrasts(microtus.train$Group)” statement, I got the table below. 当我使用“ contrasts(microtus.train $ Group)”语句时,得到了下表。

> contrasts(microtus.train$Group)
             subterraneus
multiplex               0
subterraneus            1

Based on this table, I interpret that the model is trying to predict probabilities of “subterraneus” (not the probabilities of “multiplex”) because “1” is dummy coded for “subterraneus”. 基于此表,我认为该模型正在尝试预测“地下神经”的概率(而不是“多重”概率),因为“ 1”是针对“地下神经”的伪编码。 Is my assumption correct? 我的假设正确吗?

My code is given below and I appreciate your help in advance. 下面提供了我的代码,非常感谢您的帮助。

library(Flury)
data(microtus, package = "Flury")

str(microtus)
summary(microtus)

# Creating training & test data frames
microtus.train <- subset(microtus, 
                     microtus$Group %in% c("multiplex", "subterraneus"), 
                     select = c("Group", "M1Left", "M2Left", "M3Left", 
                                "Foramen", "Pbone","Length", "Height",
                                "Rostrum") )

# Drop 3rd factor level
microtus.train$Group = droplevels(microtus.train$Group)
factor(microtus.train$Group)


nullModel.GLM <- glm(Group ~ 1, data = microtus.train, 
                     family = binomial())
fullModel.GLM <- glm(Group ~ ., data = microtus.train, 
                     family = binomial())
summary(nullModel.GLM)
summary(fullModel.GLM)

stepFwd.GLM <- step(nullModel.GLM, scope = list(upper = fullModel.GLM), 
                    direction = 'forward', k = 2)
stepFwd.GLM.fitResults <- predict(stepFwd.GLM, type = 'response')
stepFwd.GLM.fitResults

contrasts(microtus.train$Group)

It's not the contrasts that matter, but the order of the factor levels (contrasts specify how the predictor variables are encoded as dummy variables). 关键不是对比度,而是因子级别的顺序(对比度指定预测变量如何编码为虚拟变量)。 From ?glm : 来自?glm

For 'binomial' and 'quasibinomial' families the response can also be specified as a 'factor' (when the first level denotes failure and all others success) 对于“二项式”和“准二项式”族,也可以将响应指定为“因子”(当第一级表示失败而其他所有级别都表示成功时)

Since R defines the levels of factors in alphabetical order by default, "multiplex" is (probably) the first level and "subterraneus" is the second, hence the logistic regression is predicting the probability of "subterraneus". 由于R默认情况下按字母顺序定义因子的级别,“(可能)是第一个级别,而“地下”是第二个级别,因此逻辑回归预测了“地下”的可能性。 You can check this with levels(microtus$Group) , and adjust it if necessary by using factor() with the levels argument set explicitly. 您可以使用levels(microtus$Group) ,并在必要时通过将factor()显式设置为levels参数来进行调整。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM