How to calculate the confidence interval for the difference in predictions of the predict() R function?

Question

I'm new to R and I wanted to ask about how to obtain a confidence interval for the difference between two predicted values estimated using predict() function. For example,

data("cars", package = "datasets")

model <- lm(dist ~ speed, data = cars)
summary(model)

prediction1 = predict(model, newdata = list(speed = 12), interval = "confidence")
prediction1

prediction2 = predict(model, newdata = list(speed = 20), interval = "confidence")
prediction2

predict.dif = prediction2 - prediction1
predict.dif

what I got is
prediction1 = 29.60981 (lwr)24.39514 (upr)34.82448
prediction2 = 61.06908 (lwr)55.24729 (upr)66.89088

and for the difference in prediction I get
predict.diff = 31.45927 (lwr)30.85215 (upr)32.06639

My question is, in the previous example, I get the difference in predictions and R also subtracts the confidence intervals. Is it correct to subtract CIs this way? and why? and if not, I was wondering if there is a way to calculate such CI for the difference in predictions.

Thank you so much

Answer 1

You pose a good question. In summary, we should not compute confidence intervals separately and then take the differences between their respective lower bounds and respective upper bounds.

Why?

To explain why, rather than getting too much into the math, I'll illustrate via an example:

Let's take your code snippet and just change prediction2 to be a prediction when speed equals 13:
```
 data("cars", package = "datasets") model <- lm(dist ~ speed, data = cars) summary(model) prediction1 = predict(model, newdata = list(speed = 12), interval = "confidence") prediction1 prediction2 = predict(model, newdata = list(speed = 13), interval = "confidence") prediction2 predict.dif = prediction2 - prediction1 predict.dif
```
In this case your output would be:
```
 fit lwr upr 1 3.932409 4.336198 3.52862
```
But question yourself as to what we're doing here. You have a model of the form dist = a + b*speed . You then compute the model predictions at speed = 12 and speed = 13 and take the difference. So that gives (a + b*13) - (a + b*12) = b . Therefore, the confidence interval for the difference should be equal to the confidence interval for my parameter b . Let's check whether this is the case:
```
 confint(model) 2.5 % 97.5 % (Intercept) -31.167850 -3.990340 speed 3.096964 4.767853
```
You can see that they are not equal. In your approach, the interval is (4.336198, 3.52862) vs the second approach where the interval is (3.096964, 4.767853)

If I find a good/easy to follow reference on the math behind this, I will link in the comments.

Solution

You can leverage the contrast package for this purpose. Using this package, we can call:

contrast(model, a = list(speed = 13), b = list(speed = 12))

and this would output the required confidence interval for the difference:

lm model parameter contrast

  Contrast      S.E.    Lower    Upper    t df Pr(>|t|)
1 3.932409 0.4155128 3.096964 4.767853 9.46 48        0

Getting back to your example, we can do

contrast(model, a = list(speed = 20), b = list(speed = 12))

to get

lm model parameter contrast

  Contrast     S.E.    Lower    Upper    t df Pr(>|t|)
1 31.45927 3.324102 24.77571 38.14283 9.46 48        0

How to calculate the confidence interval for the difference in predictions of the predict() R function?

Question

1 answers

solution1
1 2020-12-09 07:03:17

How to calculate the confidence interval for the difference in predictions of the predict() R function?

Question

1 answers

solution1 1 2020-12-09 07:03:17

solution1
1 2020-12-09 07:03:17