[英]Survival times in survival analysis
I'm having problems to analyze a survival dataset that I have.我在分析我拥有的生存数据集时遇到问题。 I will put the
dput
output of the dataset in a github link to not pollute the question.我会将数据集的
dput
输出放在 github 链接中,以免污染问题。
Here is the data https://gist.github.com/anonymous/4fdff1c6d0853c41939e2a67d9e0e45b这是数据https://gist.github.com/anonymous/4fdff1c6d0853c41939e2a67d9e0e45b
In this dataset, I want to make plot of survival curves for each group, so I need to make a survfit()
model.在这个数据集中,我想为每个组绘制生存曲线图,所以我需要制作一个
survfit()
模型。
The variables W1,W2,..,W43 represents weeks and the numbers represents some measure.变量 W1,W2,..,W43 代表周数,数字代表某种度量。 When I have a dot
.
当我有一个点
.
in any week, it means that individual died that week, and consequently every week that follows are flagged with dot .
在任何一周中,这意味着该个人在该周死亡,因此接下来的每个星期都带有 dot 标记
.
In a survival model this death represents an event (failure) and if the individual survival all the weeks he represents a censored data.在生存模型中,这种死亡代表一个事件(失败),如果个人在所有周内都存活下来,则他代表一个经过审查的数据。
To make a survival model the way that I know I need to have a data like this below要以我知道的方式制作生存模型,我需要有如下数据
time=c(3,4,8,8,5,2)
event=c(1,1,0,0,1,1)
in this case time represents the time of death in weeks and event is 1 if death and 0 if censored.在这种情况下,时间表示以周为单位的死亡时间,如果死亡,则事件为 1,如果审查则为 0。
EDIT: I thinked in one possible solution, but I don't know how I can do it.编辑:我想到了一种可能的解决方案,但我不知道该怎么做。 The idea is below
思路如下
1) Take all the columns W1,W2,...,W43 and put 1 if its a number and put 0 if it is a dot .
1) 取 W1,W2,...,W43 的所有列,如果是数字则输入 1,如果是点则输入 0
.
2) Create a new variable that represents time and the value of this variable will be the sum of columns W1 to W43, so it will W1+W2+...+W43. 2) 创建一个代表时间的新变量,这个变量的值是W1到W43列的总和,所以它是W1+W2+...+W43。
3) Create a new variable that represents the event, then if time=43 this means that the individual survived all the time then it will be 0 (censored) and if if is less than 43 it means that the individual died, then the variable will be 1. 3)创建一个代表事件的新变量,那么如果time=43这意味着个体一直存活,那么它将为0(删失),如果小于43意味着个体死亡,那么变量将是 1。
Anyone can help me to do it?任何人都可以帮我做吗?
I named your dataset sdat and these operations add the two additional columns:我将您的数据集命名为 sdat,这些操作添加了两个额外的列:
sdat$time= apply(sdat[ ,grepl("W", names(sdat))], 1 , #work by rows on "W"-columns
function(r) which( r==".")[1] ) # seq-number of first "."
sdat$event <- as.numeric( !is.na(sdat$time) ) # convert NA's to logical and to 1,0
sdat$time= ifelse( is.na(sdat$time) , 43, sdat$time) # set time to 43 for survivors
# Check results
head( sdat[ , !grepl("W", names(sdat))] ) # remove "W" cols
Group Ref Sex M1 M2 M3 M4 time event
1 11 4 1 959 1940 10 184 23 1
2 11 4 1 960 1770 10 189 31 1
3 11 4 1 961 1970 10 166 23 1
4 11 4 1 962 1870 1 180 43 0
5 11 4 1 964 1780 11 239 43 0
6 12 4 1 966 1980 11 182 43 1
As an analyst I would be asking what meaning to attach to the varying "W"-numbers leading up to the deaths, but that was not your question.作为一名分析师,我会问导致死亡的不同“W”数字有什么意义,但这不是你的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.