繁体   English   中英

R计算具有奇点的lm模型的鲁棒标准误差(vcovHC)

[英]R calculate robust standard errors (vcovHC) for lm model with singularities

在R中,当一些系数由于奇点而被丢弃时,如何使用vcovHC()计算稳健的标准误差? 标准的lm函数似乎可以很好地计算实际估计的所有系数的正常标准误差,但vcovHC()会抛出一个错误:“面包错误。%*%肉。:不一致的参数”。

(我使用的实际数据有点复杂。事实上,它是一个使用两种不同固定效果的模型,我遇到局部奇点,我不能简单地摆脱它。至少我不知道如何。对于两个固定效应我使用第一个因子有150个级别,第二个有142个级别,总共有九个奇点,这是因为数据是在十个块中收集的。)

这是我的输出:

Call:
lm(formula = one ~ two + three + Jan + Feb + Mar + Apr + May + 
Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-130.12  -60.95    0.08   61.05  137.35 

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1169.74313   57.36807  20.390   <2e-16 ***
two           -0.07963    0.06720  -1.185    0.237    
three         -0.04053    0.06686  -0.606    0.545    
Jan            8.10336   22.05552   0.367    0.714    
Feb            0.44025   22.11275   0.020    0.984    
Mar           19.65066   22.02454   0.892    0.373    
Apr          -13.19779   22.02886  -0.599    0.550    
May           15.39534   22.10445   0.696    0.487    
Jun          -12.50227   22.07013  -0.566    0.572    
Jul          -20.58648   22.06772  -0.933    0.352    
Aug           -0.72223   22.36923  -0.032    0.974    
Sep           12.42204   22.09296   0.562    0.574    
Oct           25.14836   22.04324   1.141    0.255    
Nov           18.13337   22.08717   0.821    0.413    
Dec                 NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 69.63 on 226 degrees of freedom
Multiple R-squared: 0.04878,    Adjusted R-squared: -0.005939 
F-statistic: 0.8914 on 13 and 226 DF,  p-value: 0.5629 

> model$se <- vcovHC(model)
Error in bread. %*% meat. : non-conformable arguments

这是一个剪切的最小代码,用于重现错误。

library(sandwich)
set.seed(101)
dat<-data.frame(one=c(sample(1000:1239)),
              two=c(sample(200:439)),
              three=c(sample(600:839)),
              Jan=c(rep(1,20),rep(0,220)),
              Feb=c(rep(0,20),rep(1,20),rep(0,200)),
              Mar=c(rep(0,40),rep(1,20),rep(0,180)),
              Apr=c(rep(0,60),rep(1,20),rep(0,160)),
              May=c(rep(0,80),rep(1,20),rep(0,140)),
              Jun=c(rep(0,100),rep(1,20),rep(0,120)),
              Jul=c(rep(0,120),rep(1,20),rep(0,100)),
              Aug=c(rep(0,140),rep(1,20),rep(0,80)),
              Sep=c(rep(0,160),rep(1,20),rep(0,60)),
              Oct=c(rep(0,180),rep(1,20),rep(0,40)),
              Nov=c(rep(0,200),rep(1,20),rep(0,20)),
              Dec=c(rep(0,220),rep(1,20))) 
model <- lm(one ~ two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)
summary(model)
model$se <- vcovHC(model)

具有奇点的模型永远不会好,它们应该被修复。 在你的情况下,12个月你有12个系数,但也有全局拦截! 所以你实际上只有13个系数,只能估算12个真实参数。 你真正想要的是禁用全局拦截 - 所以你会有一些更像月份特定的拦截:

> model <- lm(one ~ 0 + two + three + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data=dat)
> summary(model)

Call:
lm(formula = one ~ 0 + two + three + Jan + Feb + Mar + Apr + 
    May + Jun + Jul + Aug + Sep + Oct + Nov + Dec, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-133.817  -55.636    3.329   56.768  126.772 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
two     -0.09670    0.06621  -1.460    0.146    
three    0.02446    0.06666   0.367    0.714    
Jan   1130.05812   52.79625  21.404   <2e-16 ***
Feb   1121.32904   55.18864  20.318   <2e-16 ***
Mar   1143.50310   53.59603  21.336   <2e-16 ***
Apr   1143.95365   54.99724  20.800   <2e-16 ***
May   1136.36429   53.38218  21.287   <2e-16 ***
Jun   1129.86010   53.85865  20.978   <2e-16 ***
Jul   1105.10045   54.94940  20.111   <2e-16 ***
Aug   1147.47152   54.57201  21.027   <2e-16 ***
Sep   1139.42205   53.58611  21.263   <2e-16 ***
Oct   1117.75075   55.35703  20.192   <2e-16 ***
Nov   1129.20208   53.54934  21.087   <2e-16 ***
Dec   1149.55556   53.52499  21.477   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 69.81 on 226 degrees of freedom
Multiple R-squared:  0.9964,    Adjusted R-squared:  0.9961 
F-statistic:  4409 on 14 and 226 DF,  p-value: < 2.2e-16

然后,这是一个正常的模型,所以你不应该有任何vcovHC的问题。

你似乎瞄准的是一个固定的效果估计,虽然这个问题在不久前提出我遇到了同样的问题,这是我的解决方案:固定效果可以通过在估算方程中包含+ factor()来控制:

所以我先创建了一个额外的列:

 # create an addtitional column in your data dat$month <- "0" #this column will contain the month, not a dummy for months for (i in 1:length(dat$one)){ if (dat[i,"Jan"]==1){ dat[i,"month"]<- "Jan"} if (dat[i,"Feb"]==1){ dat[i,"month"]<- "Feb"} if (dat[i,"Mar"]==1){ dat[i,"month"]<- "Mar"} if (dat[i,"Apr"]==1){ dat[i,"month"]<- "Apr"} if (dat[i,"May"]==1){ dat[i,"month"]<- "May"} if (dat[i,"Jun"]==1){ dat[i,"month"]<- "Jun"} if (dat[i,"Jul"]==1){ dat[i,"month"]<- "Jul"} if (dat[i,"Aug"]==1){ dat[i,"month"]<- "Aug"} if (dat[i,"Sep"]==1){ dat[i,"month"]<- "Sep"} if (dat[i,"Oct"]==1){ dat[i,"month"]<- "Oct"} if (dat[i,"Nov"]==1){ dat[i,"month"]<- "Nov"} if (dat[i,"Dec"]==1){ dat[i,"month"]<- "Dec"} } i <- NULL 

此列现在可用作回归方程中的固定或常数影响因子:

 > #you can use the created column as fixed effect factor in your + regression > model_A <- lm(one ~ two + three + factor(month), data=dat) > summary(model_A) Call: lm(formula = one ~ two + three + factor(month), data = dat) Residuals: Min 1Q Median 3Q Max -133.817 -55.636 3.329 56.768 126.772 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1143.95365 54.99724 20.800 <2e-16 *** two -0.09670 0.06621 -1.460 0.1455 three 0.02446 0.06666 0.367 0.7141 factor(month)Aug 3.51788 22.09948 0.159 0.8737 factor(month)Dec 5.60192 22.41204 0.250 0.8029 factor(month)Feb -22.62460 22.10889 -1.023 0.3072 factor(month)Jan -13.89553 22.25117 -0.624 0.5329 factor(month)Jul -38.85320 22.13980 -1.755 0.0806 . factor(month)Jun -14.09355 22.18707 -0.635 0.5259 factor(month)Mar -0.45055 22.13638 -0.020 0.9838 factor(month)May -7.58935 22.14137 -0.343 0.7321 factor(month)Nov -14.75156 22.27288 -0.662 0.5084 factor(month)Oct -26.20290 22.09416 -1.186 0.2369 factor(month)Sep -4.53159 22.26334 -0.204 0.8389 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 69.81 on 226 degrees of freedom Multiple R-squared: 0.04381, Adjusted R-squared: -0.01119 F-statistic: 0.7966 on 13 and 226 DF, p-value: 0.6635 > #and also do the same without intercept if so needed > model_B <- lm(one ~ 0 + two + three + factor(month), data=dat) > summary(model_B) Call: lm(formula = one ~ 0 + two + three + factor(month), data = dat) Residuals: Min 1Q Median 3Q Max -133.817 -55.636 3.329 56.768 126.772 Coefficients: Estimate Std. Error t value Pr(>|t|) two -0.09670 0.06621 -1.460 0.146 three 0.02446 0.06666 0.367 0.714 factor(month)Apr 1143.95365 54.99724 20.800 <2e-16 *** factor(month)Aug 1147.47152 54.57201 21.027 <2e-16 *** factor(month)Dec 1149.55556 53.52499 21.477 <2e-16 *** factor(month)Feb 1121.32904 55.18864 20.318 <2e-16 *** factor(month)Jan 1130.05812 52.79625 21.404 <2e-16 *** factor(month)Jul 1105.10045 54.94940 20.111 <2e-16 *** factor(month)Jun 1129.86010 53.85865 20.978 <2e-16 *** factor(month)Mar 1143.50310 53.59603 21.336 <2e-16 *** factor(month)May 1136.36429 53.38218 21.287 <2e-16 *** factor(month)Nov 1129.20208 53.54934 21.087 <2e-16 *** factor(month)Oct 1117.75075 55.35703 20.192 <2e-16 *** factor(month)Sep 1139.42205 53.58611 21.263 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 69.81 on 226 degrees of freedom Multiple R-squared: 0.9964, Adjusted R-squared: 0.9961 F-statistic: 4409 on 14 and 226 DF, p-value: < 2.2e-16 

这使您可以对面板数据运行常规OLS回归。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM