簡體   English   中英

使用Survival :: tmerge()進行Cox回歸的Andersen-Gill計數過程公式中的無風險間隔

[英]Risk-free interval in Andersen-Gill counting process formulation for Cox regression using survival::tmerge()

我想使用tmerge()函數來轉換數據集,以用於重復事件的Cox回歸框架的Andersen-Gill擴展中。 參見Therneau的出色小插圖

我想指定個人在事件發生后的30天內不受重復事件的影響,也就是說,我希望個人暫時退出風險設定,這樣,如果事件發生時個人沒有處於危險之中,它將被忽略。

一種原始的方法是迭代地添加所有事件,然后在tstart變量中簡單地添加30。 但是,這可能會導致實例tstart >= tstop ,並且在更大和更復雜的數據集中會造成災難性的tstart >= tstop

我試圖通過forloop來利用tmerge()函數來糾正我上面提到的問題。 對於此示例,我將在生存包中使用cgd數據。

編輯:請參閱下面的更正的forloop

library(survival)
cgd0 <- cgd0
newcgd <- tmerge(data1=cgd0[, 1:13], data2=cgd0, id=id, tstop=futime)

for(i in 1:7){        
    x <- paste0("etime", i)  #etime1:etime7

# iteratively add each event
    newcgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))

# select only observations that end in an event and iteratively create
# cumulative number of events for each individual
    newcgd <- tmerge(newcgd, subset(newcgd, infect == 1),
                     id = id, cum_infect = cumtdc(tstop))

# for each loop add 30 days to the start time of the ith cumulative event
    newcgd[which(newcgd$cum_infect == i), "tstart"] <-
           newcgd[which(newcgd$cum_infect == i), "tstart"] + 30

# for each loop remove observations were the start time >= stop time
    newcgd <- newcgd[which(newcgd$tstart < newcgd$tstop),]
}

attr(newcgd, "tcount")
#            early late gap within boundary leading trailing tied
#infect         0    0   0     44        0       0        0    0
#cum_infect     0    0   0      0       44       0        0    0
#infect         0    0   4     11        0       1        1    0
#cum_infect     0    0   0      0       11       0       45    0
#infect         0    0   2      6        0       0        0    0
#cum_infect     0    0   0      0        6       0       56    0
#infect         0    0   1      2        0       0        0    0
#cum_infect     0    0   0      0        6       0       58    0
#infect         0    0   0      2        0       0        0    0
#cum_infect     0    0   0      0        8       0       58    0
#infect         0    0   0      1        0       0        0    0
#cum_infect     0    0   0      0        9       0       58    0
#infect         0    0   0      1        0       0        0    0
#cum_infect     0    0   0      0       10       0       58    0 

我相信此解決方案是正確的。 但是,這是生存分析中的常見問題,我擔心

i)我正在忽略某些內容,並且代碼沒有執行我認為的操作。

ii)我忽略了在R中執行此操作的有效方法

iii)如果i)和ii)都不是問題,我認為此代碼效率低下,並想知道是否存在明顯的方法來提高執行速度。

-------------------------------------------------- -------------------------------------------------- -------------------------------

編輯:進一步的錯誤檢查與注釋。 希望這可以澄清我正在嘗試做的事情。 從概念上; 我指定某人在經歷事件后的30天內沒有發生另一事件的風險。 在安德森-吉爾計數處理制劑中,每一行代表包含一個起動時的觀察tstart和停止時間tstop和指示器(在這種情況下infect ),其指示所述觀察是否結束,由於事件infect == 1或檢查infect == 0 在這里,我手動進行上述forloop的步驟,並為每個循環量化發生了多少事件以及是否指定30天免疫期的總隨訪時間。 然后,將同一代碼實現為forloop的完整性。 結果顯示在下面的單獨代碼塊中。

newcgd <- tmerge(data1=cgd0[, 1:13], data2=cgd0, id=id, tstop=futime)

###1st event

x <- "etime1"
immunecgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))
immunecgd <- tmerge(immunecgd, subset(immunecgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
immunecgd[which(immunecgd$cum_infect == 1), "tstart"] <- immunecgd[which(immunecgd$cum_infect == 1), "tstart"] + 30
immunecgd <- immunecgd[which(immunecgd$tstart < immunecgd$tstop),]

newcgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))
newcgd <- tmerge(newcgd, subset(newcgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
newcgd[which(newcgd$cum_infect == 1), "tstart"] <- newcgd[which(newcgd$cum_infect == 1), "tstart"]
newcgd <- newcgd[which(newcgd$tstart < newcgd$tstop),]

etime1 <- c(sum(immunecgd$infect), sum(newcgd$infect))
futime1 <- c(sum(immunecgd$tstop - immunecgd$tstart), sum(newcgd$tstop - newcgd$tstart))

###2nd event
x <- "etime2"
immunecgd <- tmerge(immunecgd, cgd0, id = id, infect = event(cgd0[,x]))
immunecgd <- tmerge(immunecgd, subset(immunecgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
immunecgd[which(immunecgd$cum_infect == 2), "tstart"] <- immunecgd[which(immunecgd$cum_infect == 2), "tstart"] + 30
immunecgd <- immunecgd[which(immunecgd$tstart < immunecgd$tstop),]

newcgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))
newcgd <- tmerge(newcgd, subset(newcgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
newcgd[which(newcgd$cum_infect == 2), "tstart"] <- newcgd[which(newcgd$cum_infect == 2), "tstart"]
newcgd <- newcgd[which(newcgd$tstart < newcgd$tstop),]

etime2 <- c(sum(immunecgd$infect), sum(newcgd$infect))
futime2 <- c(sum(immunecgd$tstop - immunecgd$tstart), sum(newcgd$tstop - newcgd$tstart))

###3rd event
x <- "etime3"
immunecgd <- tmerge(immunecgd, cgd0, id = id, infect = event(cgd0[,x]))
immunecgd <- tmerge(immunecgd, subset(immunecgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
immunecgd[which(immunecgd$cum_infect == 3), "tstart"] <- immunecgd[which(immunecgd$cum_infect == 3), "tstart"] + 30
immunecgd <- immunecgd[which(immunecgd$tstart < immunecgd$tstop),]

newcgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))
newcgd <- tmerge(newcgd, subset(newcgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
newcgd[which(newcgd$cum_infect == 3), "tstart"] <- newcgd[which(newcgd$cum_infect == 3), "tstart"]
newcgd <- newcgd[which(newcgd$tstart < newcgd$tstop),]

etime3 <- c(sum(immunecgd$infect), sum(newcgd$infect))
futime3 <- c(sum(immunecgd$tstop - immunecgd$tstart), sum(newcgd$tstop - newcgd$tstart))

###4th event
x <- "etime4"
immunecgd <- tmerge(immunecgd, cgd0, id = id, infect = event(cgd0[,x]))
immunecgd <- tmerge(immunecgd, subset(immunecgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
immunecgd[which(immunecgd$cum_infect == 4), "tstart"] <- immunecgd[which(immunecgd$cum_infect == 4), "tstart"] + 30
immunecgd <- immunecgd[which(immunecgd$tstart < immunecgd$tstop),]

newcgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))
newcgd <- tmerge(newcgd, subset(newcgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
newcgd[which(newcgd$cum_infect == 4), "tstart"] <- newcgd[which(newcgd$cum_infect == 4), "tstart"]
newcgd <- newcgd[which(newcgd$tstart < newcgd$tstop),]

etime4 <- c(sum(immunecgd$infect), sum(newcgd$infect))
futime4 <- c(sum(immunecgd$tstop - immunecgd$tstart), sum(newcgd$tstop - newcgd$tstart))

###5th event
x <- "etime5"
immunecgd <- tmerge(immunecgd, cgd0, id = id, infect = event(cgd0[,x]))
immunecgd <- tmerge(immunecgd, subset(immunecgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
immunecgd[which(immunecgd$cum_infect == 5), "tstart"] <- immunecgd[which(immunecgd$cum_infect == 5), "tstart"] + 30
immunecgd <- immunecgd[which(immunecgd$tstart < immunecgd$tstop),]

newcgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))
newcgd <- tmerge(newcgd, subset(newcgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
newcgd[which(newcgd$cum_infect == 5), "tstart"] <- newcgd[which(newcgd$cum_infect == 5), "tstart"]
newcgd <- newcgd[which(newcgd$tstart < newcgd$tstop),]

etime5 <- c(sum(immunecgd$infect), sum(newcgd$infect))
futime5 <- c(sum(immunecgd$tstop - immunecgd$tstart), sum(newcgd$tstop - newcgd$tstart))

###6th event
x <- "etime6"
immunecgd <- tmerge(immunecgd, cgd0, id = id, infect = event(cgd0[,x]))
immunecgd <- tmerge(immunecgd, subset(immunecgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
immunecgd[which(immunecgd$cum_infect == 6), "tstart"] <- immunecgd[which(immunecgd$cum_infect == 6), "tstart"] + 30
immunecgd <- immunecgd[which(immunecgd$tstart < immunecgd$tstop),]

newcgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))
newcgd <- tmerge(newcgd, subset(newcgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
newcgd[which(newcgd$cum_infect == 6), "tstart"] <- newcgd[which(newcgd$cum_infect == 6), "tstart"]
newcgd <- newcgd[which(newcgd$tstart < newcgd$tstop),]

etime6 <- c(sum(immunecgd$infect), sum(newcgd$infect))
futime6 <- c(sum(immunecgd$tstop - immunecgd$tstart), sum(newcgd$tstop - newcgd$tstart))

###7th event
x <- "etime7"
immunecgd <- tmerge(immunecgd, cgd0, id = id, infect = event(cgd0[,x]))
immunecgd <- tmerge(immunecgd, subset(immunecgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
immunecgd[which(immunecgd$cum_infect == 7), "tstart"] <- immunecgd[which(immunecgd$cum_infect == 7), "tstart"] + 30
immunecgd <- immunecgd[which(immunecgd$tstart < immunecgd$tstop),]

newcgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))
newcgd <- tmerge(newcgd, subset(newcgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
newcgd[which(newcgd$cum_infect == 7), "tstart"] <- newcgd[which(newcgd$cum_infect == 7), "tstart"]
newcgd <- newcgd[which(newcgd$tstart < newcgd$tstop),]

etime7 <- c(sum(immunecgd$infect), sum(newcgd$infect))
futime7 <- c(sum(immunecgd$tstop - immunecgd$tstart), sum(newcgd$tstop - newcgd$tstart))

df_event <- rbind.data.frame(etime1, etime2, etime3, etime4, etime5, etime6, etime7)
colnames(df_event) <- c("immunity", "no_immunity")
df_event$diff <- df_event$no_immunity - df_event$immunity

df_futime <- rbind.data.frame(futime1, futime2, futime3, futime4, futime5, futime6, futime7)
colnames(df_futime)  <- c("immunity", "no_immunity")
df_futime$diff <- df_futime$no_immunity - df_futime$immunity

與forloop相同的代碼。

 newcgd <- tmerge(data1=cgd0[, 1:13], data2=cgd0, id=id, tstop=futime)
 immunecgd <- tmerge(data1=cgd0[, 1:13], data2=cgd0, id=id, tstop=futime)

event <- matrix(NA, nrow = 7, ncol = 2)
futime <- matrix(NA, nrow = 7, ncol = 2)
for(i in 1:7){        
    x <- paste0("etime", i)  #etime1:etime7

    # iteratively add each event
    immunecgd <- tmerge(immunecgd, cgd0, id = id, infect = event(cgd0[,x]))
    newcgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))

    # select only observations that end in an event and iteratively create
    # cumulative number of events for each individual
    immunecgd <- tmerge(immunecgd, subset(immunecgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
    newcgd <- tmerge(newcgd, subset(newcgd, infect == 1), id = id, cum_infect = cumtdc(tstop))

    # for each loop add 30 days to the start time of the ith cumulative event
    immunecgd[which(immunecgd$cum_infect == i), "tstart"] <- immunecgd[which(immunecgd$cum_infect == i), "tstart"] + 30
    newcgd[which(newcgd$cum_infect == i), "tstart"] <- newcgd[which(newcgd$cum_infect == i), "tstart"]

    # for each loop remove observations were the start time >= stop time
    immunecgd <- immunecgd[which(immunecgd$tstart < immunecgd$tstop),]
    newcgd <- newcgd[which(newcgd$tstart < newcgd$tstop),]

    event[i,] <- c(sum(immunecgd$infect), sum(newcgd$infect))
    futime[i,] <- c(sum(immunecgd$tstop - immunecgd$tstart), sum(newcgd$tstop - newcgd$tstart))
}

event <- data.frame(event)
colnames(event) <- c("immunity", "no_immunity")
event$diff <- event$no_immunity - event$immunity

futime <- data.frame(futime)
colnames(futime) <- c("immunity", "no_immunity")
futime$diff <- futime$no_immunity - futime$immunity

上面的錯誤測試代碼給出以下結果

df_event
  immunity no_immunity diff
1       44          44    0
2       56          61    5
3       62          69    7
4       64          72    8
5       66          74    8
6       67          75    8
7       68          76    8

df_futime
  immunity no_immunity diff
1    36202       37477 1275
2    35935       37477 1542
3    35875       37477 1602
4    35875       37477 1602
5    35875       37477 1602
6    35875       37477 1602
7    35875       37477 1602

-------------------------------------------------- -------------------------------------------------- -------------------------------

通過對survival包中的不同數據集,模擬數據集和我自己的個人數據集(我希望使用此代碼的數據集)進行進一步測試,我發現了一個“小故障”。 在上述版本的代碼中,如果新事件etime[i-1]落入其中一個周期,那么我們已指定該個人不受事件的影響-這正是該代碼旨在創建的實例-該事件未合並到累積事件計數器cum_infect 在下一次運行etime[i] ,個人將僅具有[i-1]個累積事件,並且控制30天是否應添加到開始時間的代碼部分

immunecgd[which(immunecgd$cum_infect == i), "tstart"] <- immunecgd[which(immunecgd$cum_infect == i), "tstart"] + 30

不會將個人標識為發生過事件。 這意味着在事件發生后,forloop只會正確添加30天免疫,直到事件的第一個實例處於這種免疫期為止。 我制作了一個不太好的修復程序。 但這有效。

newcgd <- tmerge(data1=cgd0[, 1:13], data2=cgd0, id=id, tstop=futime)
immunecgd <- tmerge(data1=cgd0[, 1:13], data2=cgd0, id=id, tstop=futime)
newcgd$cum_infect_0 <- 0
immunecgd$cum_infect_0 <- 0
event <- matrix(NA, nrow = 7, ncol = 2)
futime <- matrix(NA, nrow = 7, ncol = 2)
for(i in 1:7){        
    x <- paste0("etime", i)  #etime1:etime7

    # iteratively add each event
    immunecgd <- tmerge(immunecgd, cgd0, id = id, infect = event(cgd0[,x]))
    newcgd <- tmerge(newcgd, cgd0, id = id, infect = event(cgd0[,x]))

    # select only observations that end in an event and iteratively create
    # cumulative number of events for each individual
    immunecgd <- tmerge(immunecgd, subset(immunecgd, infect == 1), id = id, cum_infect = cumtdc(tstop))
    newcgd <- tmerge(newcgd, subset(newcgd, infect == 1), id = id, cum_infect = cumtdc(tstop))

    # create new column that will hold cumulative events between loops
    immunecgd[, paste0("cum_infect_", i)] <- immunecgd[, "cum_infect"]
    newcgd[, paste0("cum_infect_", i)] <- newcgd[, "cum_infect"]

    # for each loop add 30 days to the start time if there is atleast one cumulative event
    # and the value of the ith cumulative event is larger than the i-1th cumulative event
    immunecgd[which(immunecgd$cum_infect > 0 & immunecgd$cum_infect > immunecgd[, paste0("cum_infect_", i - 1)]), "tstart"] <-
        immunecgd[which(immunecgd$cum_infect > 0 & immunecgd$cum_infect > immunecgd[, paste0("cum_infect_", i - 1)]), "tstart"] + 30
    newcgd[which(newcgd$cum_infect > 0 & newcgd$cum_infect > newcgd[, paste0("cum_infect_", i - 1)]), "tstart"] <-
        newcgd[which(newcgd$cum_infect > 0 & newcgd$cum_infect > newcgd[, paste0("cum_infect_", i - 1)]), "tstart"]

    # for each loop remove observations were the start time >= stop time
    immunecgd <- immunecgd[which(immunecgd$tstart < immunecgd$tstop),]
    newcgd <- newcgd[which(newcgd$tstart < newcgd$tstop),]

    event[i,] <- c(sum(immunecgd$infect), sum(newcgd$infect))
    futime[i,] <- c(sum(immunecgd$tstop - immunecgd$tstart), sum(newcgd$tstop - newcgd$tstart))
}
immunecgd <- immunecgd[,!grepl("cum_infect_", colnames(immunecgd))]
newcgd <- newcgd[,!grepl("cum_infect_", colnames(newcgd))]

event <- data.frame(event)
colnames(event) <- c("immunity", "no_immunity")
event$diff <- event$no_immunity - event$immunity

futime <- data.frame(futime)
colnames(futime) <- c("immunity", "no_immunity")
futime$diff <- futime$no_immunity - futime$immunity 

在這里我們可以看到事件總數的差異

  immunity no_immunity diff
1       44          44    0
2       56          61    5
3       62          69    7
4       64          72    8
5       65          74    9
6       66          75    9
7       66          76   10

正確指定forloop發現在免疫期內發生了另外2個實例。

跟進我的評論,這是我嘗試在代碼中實現它時看到的內容:

 with( newcgd, table( tstart-tstop <= 30, infect))
 #-------------
      infect
         0   1
  TRUE 120  68

因此,如果我正確地理解了您的目標,我認為您還沒有到達那兒,我想知道您是否搞砸了,因為:

> newcgd$infect <- with( newcgd,ifelse(infect, tstart-tstop > 30, 0 ) )
> with( newcgd, table( tstart-tstop <= 30, infect))
      infect
         0
  TRUE 188

當我將所有短間隔事件都設置為0時,我什么都沒有。 但是,也許我還不了解這些問題?

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM