[英]R: Days since last event per ID
我有興趣查找自每個ID以來的最后一次活動以來的天數 。 數據如下所示:
df <- data.frame(date=as.Date(
c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001",
"06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011"), "%d/%m/%Y"),
event=c(0,0,1,0,1, 1,0,0,0,1),id = c(rep(1,5),rep(2,5)))
date event id
1 2000-07-06 0 1
2 2000-09-15 0 1
3 2000-10-15 1 1
4 2001-01-03 0 1
5 2001-03-17 1 1
6 2010-08-06 1 2
7 2010-09-15 0 2
8 2010-10-15 0 2
9 2011-01-03 0 2
10 2011-03-17 1 2
我從這里的數據表解決方案中大量借用,但這不考慮ID。
library(data.table)
setDT(df)
setkey(df, date,id)
df = df[event == 1, .(lastevent = date), key = date][df, roll = TRUE]
df[, tae := difftime(lastevent, shift(lastevent, 1L, "lag"), unit = "days")]
df[event == 0, tae:= difftime(date, lastevent, unit = "days")]
它產生以下輸出
date lastevent event id tae
1: 2000-07-06 <NA> 0 1 NA days
2: 2000-09-15 <NA> 0 1 NA days
3: 2000-10-15 2000-10-15 1 1 NA days
4: 2001-01-03 2000-10-15 0 1 80 days
5: 2001-03-17 2001-03-17 1 1 153 days
6: 2010-08-06 2010-08-06 1 2 3429 days
7: 2010-09-15 2010-08-06 0 2 40 days
8: 2010-10-15 2010-08-06 0 2 70 days
9: 2011-01-03 2010-08-06 0 2 150 days
10: 2011-03-17 2011-03-17 1 2 223 days
但是,我想要的輸出如下:
date lastevent event id tae
1: 2000-07-06 <NA> 0 1 NA days
2: 2000-09-15 <NA> 0 1 NA days
3: 2000-10-15 2000-10-15 1 1 NA days
4: 2001-01-03 2000-10-15 0 1 80 days
5: 2001-03-17 2001-03-17 1 1 153 days
6: 2010-08-06 2010-08-06 1 2 NA days
7: 2010-09-15 2010-08-06 0 2 40 days
8: 2010-10-15 2010-08-06 0 2 70 days
9: 2011-01-03 2010-08-06 0 2 150 days
10: 2011-03-17 2011-03-17 1 2 223 days
唯一的區別是第6行和tae列中的NA。 這是一則相關的未答復的帖子。 我在這里看過,但該解決方案不適用於我的情況。 還有許多其他類似問題,但不是針對每個ID的計算。 謝謝!
df <- data.table(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001","06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011"),
"%d/%m/%Y"), event=c(0,0,1,0,1, 1,0,1,0,1),id = c(rep(1,5),rep(2,5)))
tempdt <- df[event==1,]
tempdt[,tae := date - shift(date), by = id]
df <- merge(df, tempdt, by = c("date", "event", "id"), all.x = TRUE)
df[, tae := ifelse(shift(event)==1, date - shift(date), tae), by = id]
編輯
更一般的解決方案
df <- data.table(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001", "18/03/2001",
"06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011","19/03/2011"),
"%d/%m/%Y"),
event=c(1,0,0,0,0,0,1,1,1,0,1,0),id = c(rep(1,6),rep(5,6)))
##for event = 1 observations
tempdt <- df[event==1,]
tempdt[,tae := date - shift(date), by = id]
df <- merge(df, tempdt, by = c("date", "event", "id"), all.x = TRUE)
##for event = 0 observations
for(d in df[event==0, date]){
# print(as.Date(d, origin = "1970-01-01"))
df[date == d & event == 0, tae := as.Date(d, origin = "1970-01-01") -
max(df[date<d & event==1,date]), by = id]
}
編輯2現在,必須有一種更快的方法來執行此操作,但是如果第一次觀察是event = 0
,則不會導致任何警告
df <- data.table(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001","06/08/2010","15/09/2010","15/10/2010","03/01/2011","17/03/2011"),
"%d/%m/%Y"), event=c(0,0,1,0,1, 1,0,0,0,1),id = c(rep(1,5),rep(2,5)))
tempdt <- df[event==1,]
tempdt[,tae := date - shift(date), by = id]
df <- merge(df, tempdt, by = c("date", "event", "id"), all.x = TRUE)
for(i in unique(df[,id])){
# print(i)
for(d in df[date>df[id == i & event==1,min(date)] & event==0, date]){
# print(as.Date(d, origin = "1970-01-01"))
df[id == i & date == d & event == 0,
tae := as.Date(d, origin = "1970-01-01") - max(df[date<d &
event==1,date])]
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.