[英]Different Robust Standard Errors of Poisson regression in Stata and R
我沒有按照 Hilbe's, J. 2011 程序(此處稱為“書”)在第 20 頁獲得相同的結果。該程序用於使用 glm R. Hilbe's 中的泰坦尼克號數據集計算具有穩健標准誤差的泊松回歸根據以下鏈接,源代碼在表 2.4 中: Negative Binomial Regression Second edition Errata 2012
我相信泰坦尼克號數據集自在此處發布以來發生了一些變化,這是書中的程序和Stata結果以及執行的內容,導致截至 2021 年 7 月 R 中的結果不正確或不同:
library(COUNT)
data("titanic")
attach(titanic)
library(gmodels)
str(titanic)
'data.frame': 1316 obs. of 4 variables:
$ class : Factor w/ 3 levels "3rd class","1st class",..: 2 2 2 2 2 2 2 2 2 2 ...
$ age : Factor w/ 2 levels "child","adults": 2 2 2 2 2 2 2 2 2 2 ...
..- attr(*, "label")= chr "0=child; 1=adult"
..- attr(*, "format")= chr "%10.0g"
..- attr(*, "value.label.table")= Named int 0 1
.. ..- attr(*, "names")= chr "child" "adults"
$ sex : Factor w/ 2 levels "women","man": 2 2 2 2 2 2 2 2 2 2 ...
..- attr(*, "label")= chr "gender: 0=female; 1=male"
..- attr(*, "format")= chr "%8.0g"
..- attr(*, "value.label.table")= Named int 0 1
.. ..- attr(*, "names")= chr "women" "man"
$ survived: num 2 2 2 2 2 2 2 2 2 2 ...
- attr(*, "stata.info")=List of 5
..$ datalabel : chr "Hilbe, Modeling Count Data (CUP, 2014)"
..$ version : int 12
..$ time.stamp : chr "14 Jul 2014 15:12"
..$ val.labels : chr "class" "age" "sex" "survived"
..$ label.table:List of 4
.. ..$ class : Named int 1 2 3 4
.. .. ..- attr(*, "names")= chr "1st class" "2nd class" "3rd class" "crew"
.. ..$ age : Named int 0 1
.. .. ..- attr(*, "names")= chr "child" "adults"
.. ..$ sex : Named int 0 1
.. .. ..- attr(*, "names")= chr "women" "man"
.. ..$ survived: Named int 0 1
.. .. ..- attr(*, "names")= chr "no" "yes"
這本書重新調整了班級。
titanic$class <- relevel(factor(titanic$class), ref=3)
然而,截至 2021 年,“生存”已成為與我認為曾經是二進制 0="no" 和 1="yes" 整數相反的因素,因此,相應地重新編碼了生存
titanic$survived <- as.character(titanic$survived)
titanic$survived [which(titanic$survived =="no")] <- "0"
titanic$survived [which(titanic$survived =="yes")] <- "1"
titanic$survived <- as.integer(titanic$survived)
2012年勘誤表中的代碼:
tit3 <- glm(survived ~ factor(class), family=poisson, data=titanic)
irr <- exp(coef(tit3)) # vector of IRRs
library("sandwich")
rse <- sqrt(diag(vcovHC(tit3, type="HC0"))) # coef robust SEs
irr*rse # IRR robust SEs
R 控制台中的 irr*rse 輸出
(Intercept) factor(class)1st class factor(class)2nd class
0.01634255 0.19270871 0.15723303
使用匯總功能
> summary(tit3)
Call:
glm(formula = survived ~ factor(class), family = poisson, data = titanic)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1177 -0.7101 -0.7101 0.4364 1.1225
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.37783 0.07495 -18.384 < 2e-16 ***
factor(class)1st class 0.90721 0.10268 8.835 < 2e-16 ***
factor(class)2nd class 0.49603 0.11871 4.179 2.93e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 967.81 on 1315 degrees of freedom
Residual deviance: 889.69 on 1313 degrees of freedom
AIC: 1893.7
Number of Fisher Scoring iterations: 5
以下被認為是正確的估計,因為它是書中的內容。 事故率比率 (IRR) 是:
class2: 1.642184
class1: 2.477407
和估計和穩健標准。 呃。
class2: 0.1572928
class1: 0.192782
都有 P>|z| == 0. 有人可以確認嗎? 謝謝
確認的!
data('titanic', package="COUNT")
titanic <- transform(titanic, survived=as.numeric(survived) - 1,
class=relevel(class, ref=3))
tit3 <- glm(survived ~ class, family=poisson, data=titanic)
library(sandwich);library(lmtest)
(smy <- coeftest(tit3, vcovHC(tit3, type="HC0")))
# z test of coefficients:
#
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -1.377832 0.064819 -21.2565 < 2.2e-16 ***
# class1st class 0.907212 0.077786 11.6629 < 2.2e-16 ***
# class2nd class 0.496027 0.095746 5.1806 2.211e-07 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(irr <- exp(coef(tit3)))
# (Intercept) class1st class class2nd class
# 0.2521246 2.4774071 1.6421841
rse <- sqrt(diag(vcovHC(tit3, type="HC0")))
irr*rse
# (Intercept) class1st class class2nd class
# 0.01634255 0.19270871 0.15723303
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.