Stata 和 R 中泊松回歸的不同魯棒標准誤差

Question

我沒有按照 Hilbe's, J. 2011 程序（此處稱為“書”）在第 20 頁獲得相同的結果。該程序用於使用 glm R. Hilbe's 中的泰坦尼克號數據集計算具有穩健標准誤差的泊松回歸根據以下鏈接，源代碼在表 2.4 中： Negative Binomial Regression Second edition Errata 2012

我相信泰坦尼克號數據集自在此處發布以來發生了一些變化，這是書中的程序和Stata結果以及執行的內容，導致截至 2021 年 7 月 R 中的結果不正確或不同：

library(COUNT)
data("titanic")
attach(titanic)
library(gmodels)

str(titanic)
'data.frame':   1316 obs. of  4 variables:
 $ class   : Factor w/ 3 levels "3rd class","1st class",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ age     : Factor w/ 2 levels "child","adults": 2 2 2 2 2 2 2 2 2 2 ...
  ..- attr(*, "label")= chr "0=child; 1=adult"
  ..- attr(*, "format")= chr "%10.0g"
  ..- attr(*, "value.label.table")= Named int  0 1
  .. ..- attr(*, "names")= chr  "child" "adults"
 $ sex     : Factor w/ 2 levels "women","man": 2 2 2 2 2 2 2 2 2 2 ...
  ..- attr(*, "label")= chr "gender: 0=female; 1=male"
  ..- attr(*, "format")= chr "%8.0g"
  ..- attr(*, "value.label.table")= Named int  0 1
  .. ..- attr(*, "names")= chr  "women" "man"
 $ survived: num  2 2 2 2 2 2 2 2 2 2 ...
 - attr(*, "stata.info")=List of 5
  ..$ datalabel  : chr "Hilbe, Modeling Count Data (CUP, 2014)"
  ..$ version    : int 12
  ..$ time.stamp : chr "14 Jul 2014 15:12"
  ..$ val.labels : chr  "class" "age" "sex" "survived"
  ..$ label.table:List of 4
  .. ..$ class   : Named int  1 2 3 4
  .. .. ..- attr(*, "names")= chr  "1st class" "2nd class" "3rd class" "crew"
  .. ..$ age     : Named int  0 1
  .. .. ..- attr(*, "names")= chr  "child" "adults"
  .. ..$ sex     : Named int  0 1
  .. .. ..- attr(*, "names")= chr  "women" "man"
  .. ..$ survived: Named int  0 1
  .. .. ..- attr(*, "names")= chr  "no" "yes"

這本書重新調整了班級。

titanic$class <- relevel(factor(titanic$class), ref=3)

然而，截至 2021 年，“生存”已成為與我認為曾經是二進制 0="no" 和 1="yes" 整數相反的因素，因此，相應地重新編碼了生存

titanic$survived <- as.character(titanic$survived)
titanic$survived [which(titanic$survived =="no")] <- "0"
titanic$survived [which(titanic$survived =="yes")] <- "1"
titanic$survived <- as.integer(titanic$survived)

2012年勘誤表中的代碼：

tit3 <- glm(survived ~ factor(class), family=poisson, data=titanic)
irr <- exp(coef(tit3)) # vector of IRRs
library("sandwich")
rse <- sqrt(diag(vcovHC(tit3, type="HC0"))) # coef robust SEs
irr*rse # IRR robust SEs

R 控制台中的 irr*rse 輸出

 (Intercept) factor(class)1st class factor(class)2nd class 
            0.01634255             0.19270871             0.15723303

使用匯總功能

> summary(tit3)

Call:
glm(formula = survived ~ factor(class), family = poisson, data = titanic)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.1177  -0.7101  -0.7101   0.4364   1.1225  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)            -1.37783    0.07495 -18.384  < 2e-16 ***
factor(class)1st class  0.90721    0.10268   8.835  < 2e-16 ***
factor(class)2nd class  0.49603    0.11871   4.179 2.93e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 967.81  on 1315  degrees of freedom
Residual deviance: 889.69  on 1313  degrees of freedom
AIC: 1893.7

Number of Fisher Scoring iterations: 5

以下被認為是正確的估計，因為它是書中的內容。 事故率比率 (IRR) 是：

class2: 1.642184 
class1: 2.477407

和估計和穩健標准。 呃。

class2: 0.1572928
class1: 0.192782

都有 P>|z| == 0. 有人可以確認嗎？ 謝謝

Answer 1

確認的！

data('titanic', package="COUNT")
titanic <- transform(titanic, survived=as.numeric(survived) - 1,
                     class=relevel(class, ref=3))

tit3 <- glm(survived ~ class, family=poisson, data=titanic)

library(sandwich);library(lmtest)
(smy <- coeftest(tit3, vcovHC(tit3, type="HC0")))
# z test of coefficients:
#   
#                 Estimate Std. Error  z value  Pr(>|z|)    
# (Intercept)    -1.377832   0.064819 -21.2565 < 2.2e-16 ***
# class1st class  0.907212   0.077786  11.6629 < 2.2e-16 ***
# class2nd class  0.496027   0.095746   5.1806 2.211e-07 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(irr <- exp(coef(tit3)))
# (Intercept) class1st class class2nd class 
#   0.2521246      2.4774071      1.6421841 

rse <- sqrt(diag(vcovHC(tit3, type="HC0")))
irr*rse
# (Intercept) class1st class class2nd class 
#  0.01634255     0.19270871     0.15723303

Stata 和 R 中泊松回歸的不同魯棒標准誤差

問題描述

1 個解決方案

解決方案1
0 2021-07-06 04:07:11

Stata 和 R 中泊松回歸的不同魯棒標准誤差

問題描述

1 個解決方案

解決方案1 0 2021-07-06 04:07:11

解決方案1
0 2021-07-06 04:07:11