简体   繁体   中英

Calculating Confidence Intervals with Predicted Values

I have a dataframe of student attributes and test scores, and I've created a linear model for each grade level (1 through 12). I am using the broom package to efficiently create a model for each grade level. Below is a simplified example dataset and the code I am using.

Once I train the model, I use it to predict scores for the 2020 school year. The 1st grade model is applied only on the 1st grade data in the test set, 2nd grade model is applied only to the 2nd grade data in the test set, and so on.

#start df creation 

school_year <- rep(2017:2020, 120)
grade <- rep(1:12, each = 40)
attendance_rate <- round(runif(480, min=25, max=100), 1)
test_growth <- round(runif(480, min = -12, max = 38))
binary_flag <- round(runif(480, min = 0, max = 1))
score <- round(runif(480, min = 92, max = 370))
survey_response <- round(runif(480, min = 1, max = 4))

df <- data.frame(school_year, grade, attendance_rate, test_growth, binary_flag, score, survey_response) 

df$survey_response[df$grade == 1] <- NA

# end df creation

df_train <- df %>% filter(!(school_year == 2020))
df_predict <- df %>% filter(school_year == 2020)


#create models
model <- df_train %>%
  group_by(grade) %>% 
  nest() %>% 
  mutate(fit = map(data, ~ if(all(is.na(.x$survey_response)))
    lm(score ~ attendance_rate + test_growth + binary_flag, data = .x) 
    else lm(score ~ attendance_rate + test_growth + binary_flag + survey_response, data = .x)),
    tidied = map(fit, tidy),
    augmented = map(fit, augment),
    glanced = map(fit, glance))

#generate projections for values in df_predict
df_predict %>%
   nest(test_data = -grade) %>%
   inner_join(model, by = 'grade') %>%
   mutate(result = map2(fit, test_data, predict))

I am trying to determine if I can generate a 95% confidence interval for each student in the df_predict dataset while I am generating the out of sample projections. I need the standard deviation to be grade-specific. This would give me a min and max cut point that would allow me to identify outliers in the actual test results.

Try this. You can create another slot, with a new variable confinter where you can enable interval = 'prediction' that will compute the confidence intervals al 95% level. Here the code:

#generate projections for values in df_predict using interval
dfpred2 <- df_predict %>%
  nest(test_data = -grade) %>%
  inner_join(model, by = 'grade') %>%
  mutate(result = map2(fit, test_data, predict),
         confinter=map2(fit, test_data, predict,interval = 'prediction'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM