简体   繁体   中英

Generate asymptotic confidence intervals for difference of fitted values in time series regression

I have a monthly time series of count data that I have fitted a negative binomial AR1 segmented regression model with dummy variables for months to control for seasonal effects and an indicator variable for an intervention. I am attempting to quantify the magnitude of the effect of the intervention by calculating the difference of the predicted values during the last 12 months of the series, with and without the intervention effect. I want to report the confidence intervals of the difference. I have a rough understanding of how to do this and understand the need for using the multivariate delta method, but I have a hard time implementing this in R. Dummy data for clarity:

## An explanation of the structure of the df
df <- data.frame(
    count = c(10,12,9,9,12,15,17,13,10,6), 
    month = facotr(month.name, levels = month.name), 
    time = 1:length(timeseries), 
    intervention = ifelse(time >= time_of_intervention, 1, 0), 
    intervention_trend = c(rep(0, time_of_intervention), 1:(length(timeseries) - (time_of_intervention - 1))),
    lag1 = c(NA, count[-nrow(df)])
)

## models with and without the intervention
nullmodel <- glm.nb(count ~ month + time + log(lag1), data=df)
fullmodel <- glm.nb(count ~ intervention_level + intervention_trend + month + time + log(lag1), data = df)

## the difference I would like confidence intervals for
sum_without <- sum(exp(predict(nullmodel)[length(predict(nullmodel))- 11:0]))
sum_with <- sum(exp(predict(fullmodel)[length(predict(fullmodel))- 11:0]))
diff <- sum_with - sum_without

I thought I would use multivariate normal sampling of the coefficients of the model with and without and do repeated sums of the fitted values of last 12 timepoints to produce a distribution and estimate the confidence intervals from there. I don't know how to do this in R. This is as far as I have gotten:

without_coeff_sample <- mvrnorm(n = 10000, mu = nullmodel$coefficients, Sigma = vcov(nullmodel))
with_coeff_sample <- mvrnorm(n = 10000, mu = fullmodel$coefficients, Sigma = vcov(fullmodel))

I'm not sure how to proceed from here. Any thoughts appriciated.

I use the built-in quine dataset as an example. Suppose we are interested in the "effect" of being Non-Aboriginal ( Eth==N ) on days of school missed. Step 2 is where we create the counterfactual data, setting Eth to 1 (for Aboriginal) or 2 (for Non-Aboriginal). In step 3, we source a function written by SO user MrFlick that allows us to make a fake glm object (with coefficients we defined ourselves) to pass into predict() . In step 4, we compute predicted values. We create a fake glm object, using the simulated coefficients, and pass it into predict together with our (counterfactual) data.

library(MASS)

# (1) Fit model and simulate coefficients
quine.nb1 <- glm.nb(Days ~ Sex + Age + Eth + Lrn, data = quine)
coefs <- mvrnorm(1000, coef(quine.nb1), vcov(quine.nb1))

# (2) Create counterfactual data
EthA <- EthN <- quine
EthA$Eth <- 1
EthN$Eth <- 2

# (3) Source function `makeglm` to create "fake" glm object
source("https://gist.githubusercontent.com/MrFlick/ae299d8f3760f02de6bf/raw/6c54045bf29d64b31239fb9c56d5f32f085cdc40/makeglm.R")

# (4) Create predicted values using simulated coefficients and counterfactual data
preds <- lapply(list(EthA, EthN), function(dd) {
  apply(coefs, MAR = 1, function(x) {
    newmodel <- do.call(
                  makeglm, 
                  c(list(formula = as.formula(Days ~ Sex + Age + Eth + Lrn)), 
                    x,
                    list(family = quasi)))
    exp(predict(newmodel, dd))
  })
})

# (5) Compute average treatment effect
ATE <- rowMeans(preds[[2]] - preds[[1]])
quantile(ATE, c(0.05, 0.5, .95))
#        5%       50%       95% 
# -6.798913 -5.188922 -2.791218 
mean(ATE)
# [1] -5.182645

All else equal, Non-Aboriginal students miss 5 fewer days of school than Aboriginal students.

One of the issues you should consider is what your estimand is: in the example above, it is the average treatment effect. However, you might want to estimate the average effect on the treated, ie what if units that received intervention did not receive treatment? Your estimand will affect how you create your counterfactual data. In addition, since the linear time trend for intervention depends on intervention, you also need to change that variable when you create your counterfactual data. One final issue that complicates your analysis is that you have a lagged variable. It seems that every predicted value would depend on the previously predicted value. I've never done something like this before; intuitively, you may need to generate predicted values iteratively (?).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM