简体   繁体   中英

How to calculate the confidence interval for the difference in predictions of the predict() R function?

I'm new to R and I wanted to ask about how to obtain a confidence interval for the difference between two predicted values estimated using predict() function. For example,

data("cars", package = "datasets")

model <- lm(dist ~ speed, data = cars)
summary(model)

prediction1 = predict(model, newdata = list(speed = 12), interval = "confidence")
prediction1

prediction2 = predict(model, newdata = list(speed = 20), interval = "confidence")
prediction2

predict.dif = prediction2 - prediction1
predict.dif

what I got is
prediction1 = 29.60981 (lwr)24.39514 (upr)34.82448
prediction2 = 61.06908 (lwr)55.24729 (upr)66.89088

and for the difference in prediction I get
predict.diff = 31.45927 (lwr)30.85215 (upr)32.06639

My question is, in the previous example, I get the difference in predictions and R also subtracts the confidence intervals. Is it correct to subtract CIs this way? and why? and if not, I was wondering if there is a way to calculate such CI for the difference in predictions.

Thank you so much

You pose a good question. In summary, we should not compute confidence intervals separately and then take the differences between their respective lower bounds and respective upper bounds.

Why?

To explain why, rather than getting too much into the math, I'll illustrate via an example:

  1. Let's take your code snippet and just change prediction2 to be a prediction when speed equals 13:

     data("cars", package = "datasets") model <- lm(dist ~ speed, data = cars) summary(model) prediction1 = predict(model, newdata = list(speed = 12), interval = "confidence") prediction1 prediction2 = predict(model, newdata = list(speed = 13), interval = "confidence") prediction2 predict.dif = prediction2 - prediction1 predict.dif

    In this case your output would be:

     fit lwr upr 1 3.932409 4.336198 3.52862

    But question yourself as to what we're doing here. You have a model of the form dist = a + b*speed . You then compute the model predictions at speed = 12 and speed = 13 and take the difference. So that gives (a + b*13) - (a + b*12) = b . Therefore, the confidence interval for the difference should be equal to the confidence interval for my parameter b . Let's check whether this is the case:

     confint(model) 2.5 % 97.5 % (Intercept) -31.167850 -3.990340 speed 3.096964 4.767853
  2. You can see that they are not equal. In your approach, the interval is (4.336198, 3.52862) vs the second approach where the interval is (3.096964, 4.767853)

If I find a good/easy to follow reference on the math behind this, I will link in the comments.

Solution

You can leverage the contrast package for this purpose. Using this package, we can call:

contrast(model, a = list(speed = 13), b = list(speed = 12))

and this would output the required confidence interval for the difference:

lm model parameter contrast

  Contrast      S.E.    Lower    Upper    t df Pr(>|t|)
1 3.932409 0.4155128 3.096964 4.767853 9.46 48        0

Getting back to your example, we can do

contrast(model, a = list(speed = 20), b = list(speed = 12))

to get

lm model parameter contrast

  Contrast     S.E.    Lower    Upper    t df Pr(>|t|)
1 31.45927 3.324102 24.77571 38.14283 9.46 48        0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM