简体   繁体   中英

R's sandwich package producing strange results for robust standard errors in linear model

I am trying to find heteroskedasticity-robust standard errors in R, and most solutions I find are to use the coeftest and sandwich packages. However, when I use those packages, they seem to produce queer results (they're way too significant). Both my professor and I agree that the results don't look right. Could someone please tell me where my mistake is? Am I using the right package? Does the package have a bug in it? What should I use instead? Or can you reproduce the same results in STATA?

(The data is CPS data from 2010 to 2014, March samples. I created a MySQL database to hold the data and am using the survey package to help analyze it.)

Thank you in advance. (I have abridged the code somewhat to make it easier to read; let me know if you need to see more.)

>male.nat.reg <- svyglm(log(HOURWAGE) ~ AGE + I(AGE^2) + ... + OVERWORK, subset(fwyrnat2010.design, FEMALE == 0))

>summary(male.nat.reg)

Call:
NextMethod(formula = "svyglm", design)

Survey design:
subset(fwyrnat2010.design, FEMALE == 0)

Coefficients:
           Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.599e+00  6.069e-02  26.350  < 2e-16 ***
AGE           4.030e-02  3.358e-03  12.000  < 2e-16 ***
I(AGE^2)     -4.131e-04  4.489e-05  -9.204 9.97e-16 ***
NOHSDEG      -1.730e-01  1.281e-02 -13.510  < 2e-16 ***
ASSOC         1.138e-01  1.256e-02   9.060 2.22e-15 ***
SOMECOLL      5.003e-02  9.445e-03   5.298 5.11e-07 ***
BACHELOR      2.148e-01  1.437e-02  14.948  < 2e-16 ***
GRADUATE      3.353e-01  3.405e-02   9.848  < 2e-16 ***
INMETRO       3.879e-02  9.225e-03   4.205 4.93e-05 ***
NCHILDOLD     1.374e-02  4.197e-03   3.273 0.001376 ** 
NCHILDYOUNG   2.334e-02  6.186e-03   3.774 0.000247 ***
NOTWHITE     -5.026e-02  8.583e-03  -5.856 3.92e-08 ***
MARRIED      -8.226e-03  1.531e-02  -0.537 0.592018    
NEVERMARRIED -4.644e-02  1.584e-02  -2.932 0.004009 ** 
NOTCITIZEN   -6.759e-02  1.574e-02  -4.295 3.47e-05 ***
STUDENT      -1.231e-01  1.975e-02  -6.231 6.52e-09 ***
VET           3.336e-02  1.751e-02   1.905 0.059091 .  
INUNION       2.366e-01  1.271e-02  18.614  < 2e-16 ***
PROFOCC       2.559e-01  1.661e-02  15.413  < 2e-16 ***
TSAOCC        9.997e-02  1.266e-02   7.896 1.27e-12 ***
FFFOCC        2.076e-02  2.610e-02   0.795 0.427859    
PRODOCC       2.164e-01  1.281e-02  16.890  < 2e-16 ***
LABOROCC      6.074e-02  1.253e-02   4.850 3.60e-06 ***
AFFIND        6.834e-02  2.941e-02   2.324 0.021755 *  
MININGIND     3.034e-01  3.082e-02   9.846  < 2e-16 ***
CONSTIND      1.451e-01  1.524e-02   9.524  < 2e-16 ***
MANUFIND      1.109e-01  1.393e-02   7.963 8.80e-13 ***
UTILIND       1.422e-01  1.516e-02   9.379 3.78e-16 ***
WHOLESALEIND  2.884e-02  1.766e-02   1.633 0.104910    
FININD        6.215e-02  2.084e-02   2.983 0.003436 ** 
BUSREPIND     6.588e-02  1.755e-02   3.753 0.000266 ***
SERVICEIND    5.412e-02  2.403e-02   2.252 0.026058 *  
ENTERTAININD -1.192e-01  3.060e-02  -3.896 0.000159 ***
PROFIND       1.536e-01  1.854e-02   8.285 1.55e-13 ***
OVERWORK      6.738e-02  1.007e-02   6.693 6.59e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.1367476)

Number of Fisher Scoring iterations: 2

>coeftest(male.nat.reg, vcov = vcovHC(male.nat.reg, type = 'HC0'))

z test of coefficients:

                Estimate  Std. Error   z value  Pr(>|z|)    
(Intercept)   1.5992e+00  9.7176e-08  16456481 < 2.2e-16 ***
AGE           4.0296e-02  5.4766e-09   7357823 < 2.2e-16 ***
I(AGE^2)     -4.1314e-04  7.3222e-11  -5642330 < 2.2e-16 ***
NOHSDEG      -1.7305e-01  1.4431e-08 -11991482 < 2.2e-16 ***
ASSOC         1.1378e-01  1.4248e-08   7985751 < 2.2e-16 ***
SOMECOLL      5.0035e-02  9.9689e-09   5019088 < 2.2e-16 ***
BACHELOR      2.1476e-01  2.0588e-08  10430993 < 2.2e-16 ***
GRADUATE      3.3533e-01  8.3327e-08   4024301 < 2.2e-16 ***
INMETRO       3.8790e-02  8.9666e-09   4326013 < 2.2e-16 ***
NCHILDOLD     1.3738e-02  5.2244e-09   2629554 < 2.2e-16 ***
NCHILDYOUNG   2.3344e-02  5.5405e-09   4213300 < 2.2e-16 ***
NOTWHITE     -5.0261e-02  1.0150e-08  -4951908 < 2.2e-16 ***
MARRIED      -8.2263e-03  1.8867e-08   -436026 < 2.2e-16 ***
NEVERMARRIED -4.6440e-02  1.7847e-08  -2602096 < 2.2e-16 ***
NOTCITIZEN   -6.7594e-02  2.4446e-08  -2765080 < 2.2e-16 ***
STUDENT      -1.2306e-01  3.2514e-08  -3785014 < 2.2e-16 ***
VET           3.3356e-02  3.0996e-08   1076125 < 2.2e-16 ***
INUNION       2.3659e-01  1.7786e-08  13301699 < 2.2e-16 ***
PROFOCC       2.5594e-01  2.2177e-08  11540563 < 2.2e-16 ***
TSAOCC        9.9971e-02  1.6707e-08   5983922 < 2.2e-16 ***
FFFOCC        2.0762e-02  2.3625e-08    878801 < 2.2e-16 ***
PRODOCC       2.1638e-01  1.3602e-08  15907683 < 2.2e-16 ***
LABOROCC      6.0741e-02  1.3445e-08   4517854 < 2.2e-16 ***
AFFIND        6.8342e-02  3.2895e-08   2077563 < 2.2e-16 ***
MININGIND     3.0343e-01  3.2948e-08   9209326 < 2.2e-16 ***
CONSTIND      1.4512e-01  2.1871e-08   6635457 < 2.2e-16 ***
MANUFIND      1.1094e-01  1.9636e-08   5649569 < 2.2e-16 ***
UTILIND       1.4216e-01  2.0930e-08   6792029 < 2.2e-16 ***
WHOLESALEIND  2.8842e-02  1.8662e-08   1545525 < 2.2e-16 ***
FININD        6.2147e-02  2.8214e-08   2202691 < 2.2e-16 ***
BUSREPIND     6.5883e-02  2.7866e-08   2364269 < 2.2e-16 ***
SERVICEIND    5.4118e-02  2.4758e-08   2185907 < 2.2e-16 ***
ENTERTAININD -1.1922e-01  2.9474e-08  -4044852 < 2.2e-16 ***
PROFIND       1.5364e-01  3.0132e-08   5098879 < 2.2e-16 ***
OVERWORK      6.7376e-02  1.0981e-08   6135525 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The sandwich package is object-oriented and essentially relies on two methods being available: estfun() and bread() , see the package vignettes for more details. For objects of class svyglm these methods are not available but as svyglm objects inherit from glm the glm methods are found and used. I suspect that this leads to incorrect results in the survey context though, possibly by a weighting factor or so. I'm not familiar enough with the survey package to provide a workaround. The survey maintainer might be able to say more... Hope that helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM