简体   繁体   中英

Interpretation of confidence intervals of LOESS regression

I have a data frame that looks like this:

days <- c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6,6,6,7,7,7,7,7,7)
values <- c(3,4,5,6,7,8,1,4,5,9,9,10,3,4,7,7,7,10,4,6,6,8,9,9,1,1,2,3,3,6,2,2,3,5,6,6,6,7,7,8,9,9)

df <- data.frame(days,values)

In order to identify a signal in my data, I decided to apply a simple loess smoothing:

loess <- loess(df$values~df$days, span=1)
pred <- predict(loess, loess$x, se=TRUE)

plot(days,values)

lines(loess$x, loess$fitted, col="red", lwd=1.5)

lines(loess$x,pred$fit - qt(0.975,pred$df)*pred$se, lty="dashed", col="blue", lwd=1)
lines(loess$x,pred$fit + qt(0.975,pred$df)*pred$se, lty="dashed", col="blue", lwd=1)

The results, including confidence intervals, looks like this: df_loess_95ci

I see that, for instance, day 3 has higher values that day 5, but is this difference significant? Should I compare the confidence intervals of each day (which, in this situation, overlap), or should I base my interpretation on the loess curve vs CIs (eg at day 3 the curve is outside the confidence intervals of day 5)?

You have misunderstood the confidence intervals here. For regression problems of type:

observations = signal + noise

we often produce point-wise confidence intervals for the estimated signal. Such convention includes linear models, generalized linear models, nonparametric regression, etc.

Point-wise confidence interval is easy to compute, and helps you assess how variable an estimate is on every point, but you can not make cross comparison. In your case, that point-wise confidence band for each day have overlap does not imply that there is no statistical difference between them.

If you really want to test whether there is significant difference between observations from different days, for example, test whether day3 and day5 have significant difference, you should do t-test:

day3 <- df$values[df$days == 3]
day5 <- df$values[df$days == 5]
## test whether two group have the same mean, i.e., mean difference = 0
t.test(day3, day5, mu = 0)$p.value
## 0.0177

Now, you can see that day5 and day3 are statistically different at 0.05 significance level.

Generally, you should do pairwise t-test :

pairwise.t.test(df$values, df$days, p.adjust.method = "none", pool.sd = FALSE)

#    Pairwise comparisons using t tests with non-pooled SD 

# data:  df$values and df$days 

#   1       2       3       4       5       6      
# 2 0.62614 -       -       -       -       -      
# 3 0.52954 1.00000 -       -       -       -      
# 4 0.20951 0.69979 0.62189 -       -       -      
# 5 0.02519 0.05772 0.01775 0.00307 -       -      
# 6 0.19799 0.19597 0.10104 0.02372 0.24736 -      
# 7 0.04247 0.41763 0.27750 0.50416 0.00044 0.00355

# P value adjustment method: none 

Note the use of pairwist.t.test here:

  • by setting pool.sd = FALSE , no common standard error is estimated for all groups;
  • by setting p.adjust.method = "none" , raw p-value returned by t.test is not adjusted.

In this way, you can see that the p-value at (5,3) matches what we see in our previous t-test.

Your data have tied values, so that you are sort of lucky. For real time series, you have no replicate for each day; in that case, there is no pairwise t-test for you to do.

I think this question might be better suited to https://stats.stackexchange.com/

To answer your question though, you want to compare the CIs as you would if you were comparing two estimates not in a time series.

NB Overlapping CIs do not necessarily mean that there is no significant difference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM