简体   繁体   English

使用R生存软件包计算proc寿命测试95%CI的中位生存时间

[英]Calculate the proc lifetest 95%CI for median survival time using R survival package

I have been trying to replicate the results of proc lifetest in SAS using R ( survival package and survifit function) -and especially calculate the 95%Confidence interval for the median survival time. 我一直在尝试使用R( survival软件包和survifit函数)在SAS中复制proc lifetest的结果-尤其是计算中位生存时间的95%置信区间。

I know that SAS is using the following formula to calculate the confidence interval for the median: 我知道SAS使用以下公式来计算中位数的置信区间:

*abs(g(S(t))-g(1-0.5)/g'(S(t))σ(S(t)))<=1.96*

with g'(x) being the first derivative of g(x) and σ(S(t)) is the standard error of the Survival curve, and the default transformation of g in SAS is the g(x)=log(-log(x)) 其中g'(x)是g(x)的一阶导数,而σ(S(t))是生存曲线的标准误差,并且SAS中g的默认变换是g(x)=log(-log(x))

So the formula inside the absolute becomes: 因此,绝对值内的公式变为:

(log(-log(S(t)))-log(-log(0.5)))*S(t)*log(S(t))/σ(S(t))

Here is an example using the kidney data from the survival package: 这是使用survival包中kidney数据的示例:

fit1 = survfit(Surv(kidney$time,kidney$status)~kidney$sex, data=kidney)
print(fit1)
BCinds<-abs((log(-log(fit1$surv))-log(-log(0.5)))*fit1$surv*log(fit1$surv)/fit1$std.err)<=1.96

when I run the code I got from the print(fit1) : 当我运行从print(fit1)获得的代码时:

                n events median 0.95LCL 0.95UCL
kidney$sex=1 20     18     22      12      63
kidney$sex=2 56     40    130      66     190

However when I calculate it through the BCinds I get a very different and wider CI of (9, 154) for sex=1 and for sex=2 the CI is (39, 511). 但是,当我通过BCinds计算它时,对于性别= 1和性别= 2,我得到的差异BCinds非常不同且更宽,CI为( BCinds )。

sex=1 95%CI: (9, 154)  sex=2 95%CI: (39, 511)

SAS code also produces different confidence intervals for the median survival time for the same dataset: SAS代码还会为同一数据集的中位生存时间产生不同的置信区间:

    ods graphics on;
proc lifetest data=work.test
    plots=survival(nocensor cb=hw cl strata=panel);
    strata sex/group=sex;
    time time*status(0);
    run;
ods graphics off;

Results in the following: 结果如下:

 sex=1: median=22 and 95%CI: (12, 30)
 sex=2: median=130 and 95%CI: (58,185)

Any ideas on why I get so different results? 关于为什么我得到如此不同结果的任何想法吗? Also could you suggest how I could automate the final step of the method? 还可以建议我如何使方法的最后一步自动化吗? At the moment I do it visually but I would like to put it in a loop so I need to do it automatically. 目前,我以视觉方式进行操作,但是我想将其循环放置,所以我需要自动进行操作。

Thanks! 谢谢!

Update 更新资料

So after "randomly" typing arguments in the R code I managed to solve part of my problem. 因此,在R代码中“随机”键入参数后,我设法解决了部分问题。

So survfit calculates the median time confidence interval using the log transformation of the formula given above and that is why there is a disagreement between the intervals of R and SAS (which uses by default the log-log transformation). 因此, survfit使用上面给出的公式的对数转换来计算中值时间置信区间,这就是为什么R和SAS的区间(默认使用对数-对数转换)之间存在分歧的原因。

So by adding an argument in the R code we can force R to calculate the Confidence intervals the same way SAS does. 因此,通过在R代码中添加自变量,我们可以强制R以与SAS相同的方式计算置信区间。 So for the example I gave above with the kidney data we have: 因此,对于我上面给出的有关kidney数据的示例,我们有:

    `survfit(Surv(kidney$time,kidney$status)~kidney$sex, conf.type="log-log"
    + )
    Call: survfit(formula = Surv(kidney$time, kidney$status) ~ kidney$sex, 
        conf.type = "log-log")

              n events median 0.95LCL 0.95UCL
kidney$sex=1 20     18     22      12      30
kidney$sex=2 56     40    130      58     185`

other confidence interval types that we can get from the survfit are: “log”, “log-log”, “plain”, “none” 我们可以从survfit得到的其他置信区间类型是: “log”, “log-log”, “plain”, “none”

I still haven't figured out though what is wrong with the code I used to get the confidence interval so if anyone has any idea what is wrong with it I would appreciate any feedback. 我仍然没有弄清楚我用来获得置信区间的代码有什么问题,所以如果有人知道它有什么问题,我将不胜感激。

I guess it's because of the fit1$std.err part in BCinds . 我想这是因为的fit1$std.err部分BCinds Here you are supposed to fit in the standard error of S(t) -- but fit1$std.err (according to the R documentation of survfit.object ) gives you the standard error of the cumulative hazard or -log(survival). 在这里,你应该以适合的标准误差S(t) -但fit1$std.err (根据的R文件survfit.object )为您提供了累积风险或-log(存活)的标准误差。 Try using summary(fit1)$std.err instead. 尝试改用summary(fit1)$std.err

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM