[英]Inconsistent results between glm() in R and manual implementation of logistic regression in Excel
You'll find a manual implementation of logistic regression in Excel at: http://blog.excelmasterseries.com/2014/06/logistic-regression-performed-in-excel.html . 您可以在以下网址找到Excel中的逻辑回归的手动实现: http : //blog.excelmasterseries.com/2014/06/logistic-regression-performed-in-excel.html 。
This implementation uses the dataset below and reports the following coefficients 此实现使用下面的数据集并报告以下系数
b0 = 12.48285608
b0 = 12.48285608
b1 = -0.117031374
b1 = -0.117031374
b2 = -1.469140055
b2 = -1.469140055
However, when I analyze the same dataset with glm()
in R , the results are not the same, ie: 但是,当我使用R中的
glm()
分析相同的数据集时,结果是不相同的,即:
b0 = 1.687445
b0 = 1.687445
b1 = -0.012525
b1 = -0.012525
b2 = -0.116473
b2 = -0.116473
d <- structure(list(Y = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), X1 = c(78L, 73L, 73L,
71L, 68L, 59L, 57L, 49L, 35L, 27L, 59L, 57L, 44L, 38L, 36L, 36L,
22L, 22L, 15L, 10L), X2 = c(8L, 8L, 5L, 7L, 5L, 4L, 7L, 5L, 4L,
7L, 3L, 4L, 5L, 5L, 4L, 2L, 6L, 5L, 4L, 6L)), .Names = c("Y",
"X1", "X2"), class = "data.frame", row.names = c(NA, -20L))
summary(glm(Y ~ X1+X2, data=d), family=binomial(link='logit'))
# > summary(glm(Y ~ X1+X2, data=d), family=binomial(link='logit'))
#
# Call:
# glm(formula = Y ~ X1 + X2, data = d)
#
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -0.78318 -0.20641 0.07689 0.24375 0.49237
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1.687445 0.319872 5.275 6.18e-05 ***
# X1 -0.012525 0.004376 -2.862 0.0108 *
# X2 -0.116473 0.056959 -2.045 0.0567 .
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for gaussian family taken to be 0.146843)
#
# Null deviance: 5.0000 on 19 degrees of freedom
# Residual deviance: 2.4963 on 17 degrees of freedom
# AIC: 23.139
#
# Number of Fisher Scoring iterations: 2
Why do the results differ? 为什么结果不同?
You have the family parameter in the wrong place. 您的家庭参数放置在错误的位置。 It should be in the
glm()
call, not the summary()
call. 它应该在
glm()
调用中,而不是summary()
调用中。
summary(glm(Y ~ X1+X2, data=d, family=binomial(link='logit')))
If you don't include the family in the glm()
, it will do a gaussian (linear) regression. 如果不将族包含在
glm()
,它将进行高斯(线性)回归。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.