简体   繁体   中英

Implausibly wide confidence intervals produced by survfit() for interval censored data

I have data which is generated by intermittent interviews in which an individual is asked whether they are experiencing a certain symptom. The last time each individual was known to not have this particular symptom, is denoted as tstart . If applicable, the time at which the individual is observed to have developed the symptom is tstop . Using the survival package in R, a survival object is created with the Surv function, specifying that this is interval censored data. I would like a non-parametric maximum likelihood estimate of the survival function. This can be accomplished using the survfit function, which seems to pass the call to an internal function survfitTurnbull . The resulting confidence intervals are implausibly wide. I am unable to figure out why this is the case.

# A random sample of the data using dput()
structure(list(tstart = c(0.01, 38, 0.01, 0.01, 23, 26, 0.01, 
19, 0.01, 0.01, 22, 6, 0.01, 14, 16, 0.01, 0.01, 0.01, 0.01, 
21, 15, 0.01, 0.01, 13, 10, 0.01, 0.01, 19, 0.01, 0.01, 0.01, 
0.01, 22, 17, 27, 14, 16, 0.01, 20, 27, 10, 0.01, 0.01, 16, 20, 
7, 6, 15, 0.01, 0.01), tstop = c(4.01, NA, 5.01, 8.01, NA, NA, 
5.01, NA, 3.01, 16.01, NA, 6.01, 8.01, NA, NA, 7.01, 16.01, 1.01, 
10.01, NA, NA, 5.01, 8.01, NA, NA, 2.01, 3.01, NA, 7.01, 5.01, 
2.01, 9.01, NA, NA, NA, NA, NA, 10.01, NA, NA, NA, 5.01, 10.01, 
NA, NA, NA, 7.01, NA, 14.01, 4.01)), row.names = c(NA, -50L), class = "data.frame")

survObj <- with(temp_df, Surv(time = tstart, time2 = tstop, type = "interval2"))
survFit <- survfit(SurvObj ~ 1))
summary(survFit)

The confidence interval does not narrow over time. It is no narrower using the whole dataset (which is contains approximately 10 times the number of events). I am unable to figure out what is going wrong.

For what it's worth, this does not look like a bug in the software , but rather a potential limitation of using something as flexible as the non-parameteric maximum likelihood estimator (NPMLE, also known as the Turnbull estimator, which survfit is fitting if you give it interval censored data) for estimating a survival curve. The TLDR version of this answer is that I suggest you use a parametric model such as Weibull, either using survival::survreg , icenReg::ic_par or icenReg::ic_bayes . Admission of bias: I'm the author of icenReg.

A somewhat technical but very relevant note about the NPMLE is that it only assigns positive probability mass to Turnbull Intervals, which are intervals defined as having the left side of the interval being the left side of some observation interval and the right side of the Turnbull Interval being the next closest right-side of any of the observation intervals. To illustrate, I've plotted your observation intervals and the corresponding Turnbull intervals.

在此处输入图像描述

Note that there is a huge gap between the last two Turnbull intervals, This leads to a very "jumpy" NPMLE. which also leads to quite a bit of error in-between the jumps.

After having spent a long time thinking about this issue, my quick summary is that this is a consequence of having only mildly informative data and too much flexibility. In most survival analysis cases, it is reasonable to assume a smooth survival curve, such as a parametric distribution. As long as the distribution is not too overly restrictive (read: the one parameter exponential distribution), this mild assumption of smoothness allows you to gain much more information out of your data without introducing too much bias.

To illustrate, I've attached a plot of a Weibull fit + confidence intervals and the fitted NPMLE next to it.

在此处输入图像描述

FYI, the box that you see with the NPMLE is not a confidence interval, but rather that the NPMLE is only unique up to the probability assigned to each Turnbull interval, but how that probability is distributed within the interval does not affect the log-likelihood. So any survival curve that passes through that box maximizes the log-likelihood.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM