简体   繁体   中英

R matching dataframe before and after regression with missing values and subset regression

I have included a toy example to recreate my error:

data(cars)
cars$dist[cars$dist<5]<-NA
cars$fast<- (cars$speed>10)*1

fit<-lm(speed~dist,cars)


cl   <- function(dat,fm, cluster){
  require(sandwich, quietly = TRUE)
  require(lmtest, quietly = TRUE)
  M <- length(unique(cluster))
  N <- length(cluster)
  K <- fm$rank
  dfc <- (M/(M-1))*((N-1)/(N-K))
  uj  <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum));
  vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
  result<-coeftest(fm, vcovCL)
  return(result)}

cl(cars,fit,cars$fast)

Error in tapply(x, cluster, sum) : arguments must have same length

The issue is that the original dataframe is bigger than the dataframe used in the regresssion due to the removed NA's and subset regression. I need to compute the robust standard errors, so I have to compute the SEs with the function cl, but how do I identify the NAs removed and appropriately subset so I can identify the correct cluster to go with the dataframe.

Thanks in advance.

You can use complete.cases to indentify the NAs in your data but in this case it will be better to use the information in your lm object on the way it handled NA's (Thanks to @Dwin for pointing better way to access this information and more generally how to simplify this answer).

data(cars)
cars$dist
cars$dist[cars$dist < 5] <- NA
cars$fast<- (cars$speed > 10) * 1
which(!complete.cases(cars))
## [1] 1 3

fit <- lm(speed ~ dist, data = cars)
fit$na.action
## 1 3 
## 1 3 
## attr(,"class")
## [1] "omit"

Therefore, your final function should like this

cl   <- function(fm, cluster){
    require(sandwich, quietly = TRUE)
    require(lmtest, quietly = TRUE)
    M <- length(unique(cluster))
    N <- length(cluster)
    K <- fm$rank
    dfc <- (M/(M-1))*((N-1)/(N-K))
    uj  <- apply(estfun(fm),2, function(x) tapply(x, cluster[-fm$na.action], sum));
    vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
    result<-coeftest(fm, vcovCL)
    result}

cl(fit,cars$fast)
## t test of coefficients:

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   8.8424     2.9371    3.01  0.00422
## dist          0.1561     0.0426    3.67  0.00063

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM