简体   繁体   English

生存分析中的生存时间

[英]Survival times in survival analysis

I'm having problems to analyze a survival dataset that I have.我在分析我拥有的生存数据集时遇到问题。 I will put the dput output of the dataset in a github link to not pollute the question.我会将数据集的dput输出放在 github 链接中,以免污染问题。

Here is the data https://gist.github.com/anonymous/4fdff1c6d0853c41939e2a67d9e0e45b这是数据https://gist.github.com/anonymous/4fdff1c6d0853c41939e2a67d9e0e45b

In this dataset, I want to make plot of survival curves for each group, so I need to make a survfit() model.在这个数据集中,我想为每个组绘制生存曲线图,所以我需要制作一个survfit()模型。

The variables W1,W2,..,W43 represents weeks and the numbers represents some measure.变量 W1,W2,..,W43 代表周数,数字代表某种度量。 When I have a dot .当我有一个点. in any week, it means that individual died that week, and consequently every week that follows are flagged with dot .在任何一周中,这意味着该个人在该周死亡,因此接下来的每个星期都带有 dot 标记.

In a survival model this death represents an event (failure) and if the individual survival all the weeks he represents a censored data.在生存模型中,这种死亡代表一个事件(失败),如果个人在所有周内都存活下来,则他代表一个经过审查的数据。

To make a survival model the way that I know I need to have a data like this below要以我知道的方式制作生存模型,我需要有如下数据

time=c(3,4,8,8,5,2)
event=c(1,1,0,0,1,1)

in this case time represents the time of death in weeks and event is 1 if death and 0 if censored.在这种情况下,时间表示以周为单位的死亡时间,如果死亡,则事件为 1,如果审查则为 0。

EDIT: I thinked in one possible solution, but I don't know how I can do it.编辑:我想到了一种可能的解决方案,但我不知道该怎么做。 The idea is below思路如下

1) Take all the columns W1,W2,...,W43 and put 1 if its a number and put 0 if it is a dot . 1) 取 W1,W2,...,W43 的所有列,如果是数字则输入 1,如果是点则输入 0 .

2) Create a new variable that represents time and the value of this variable will be the sum of columns W1 to W43, so it will W1+W2+...+W43. 2) 创建一个代表时间的新变量,这个变量的值是W1到W43列的总和,所以它是W1+W2+...+W43。

3) Create a new variable that represents the event, then if time=43 this means that the individual survived all the time then it will be 0 (censored) and if if is less than 43 it means that the individual died, then the variable will be 1. 3)创建一个代表事件的新变量,那么如果time=43这意味着个体一直存活,那么它将为0(删失),如果小于43意味着个体死亡,那么变量将是 1。

Anyone can help me to do it?任何人都可以帮我做吗?

I named your dataset sdat and these operations add the two additional columns:我将您的数据集命名为 sdat,这些操作添加了两个额外的列:

sdat$time= apply(sdat[ ,grepl("W", names(sdat))], 1 , #work by rows on "W"-columns
                    function(r) which( r==".")[1] )  # seq-number of first "."
sdat$event <- as.numeric( !is.na(sdat$time) ) # convert NA's to logical and to 1,0
sdat$time= ifelse( is.na(sdat$time) , 43, sdat$time) # set time to 43 for survivors

 # Check results
 head( sdat[ , !grepl("W", names(sdat))] ) # remove "W" cols
  Group Ref Sex  M1   M2 M3  M4 time event
1    11   4   1 959 1940 10 184   23     1
2    11   4   1 960 1770 10 189   31     1
3    11   4   1 961 1970 10 166   23     1
4    11   4   1 962 1870  1 180   43     0
5    11   4   1 964 1780 11 239   43     0
6    12   4   1 966 1980 11 182   43     1

As an analyst I would be asking what meaning to attach to the varying "W"-numbers leading up to the deaths, but that was not your question.作为一名分析师,我会问导致死亡的不同“W”数字有什么意义,但这不是你的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM