[英]cumulative days passed since reoccurring event by group
我想計算自event==1
以來經過的累積天數。 是否可以使用data.table
在R中執行此data.table
?
期望的結果:
id date event passed
1: A 2000-01-13 1 0
2: A 2000-01-18 0 5
3: A 2000-01-25 0 12
4: A 2000-01-31 1 0
5: B 2012-10-10 1 0
6: B 2012-10-11 0 1
7: B 2012-10-14 1 0
8: B 2012-10-15 0 1
9: C 2005-07-25 1 0
10: C 2005-07-31 0 6
df <- data.table(
id = c("A", "A", "A", "A",
"B", "B", "B", "B",
"C", "C"),
date = c("2000-01-13", "2000-01-18", "2000-01-25", "2000-01-31", # A
"2012-10-10", "2012-10-11", "2012-10-14", "2012-10-15", # B
"2005-07-25", "2005-07-31"), # C
event = c(1, 0, 0, 0,
0, 0, 1, 0,
1, 0)
)
編輯(12/12/17):嘗試使用@ Psidom的解決方案。
解決方案需要對id
和date
進行排序,這不是問題。 然而,注意到第6行:計算了一天,雖然這應該是0,因為該組尚未發生任何事件。
df2 <- df[sample(nrow(df)),]
df2 = df2[order(id, date)]
df2[, days_from_start := cumsum(c(0, diff(as.Date(date)))), by = .(id, cumsum(event))]
id date event days_from_start
1: A 2000-01-13 1 0
2: A 2000-01-18 0 5
3: A 2000-01-25 0 12
4: A 2000-01-31 0 18
5: B 2012-10-10 0 0
6: B 2012-10-11 0 1
7: B 2012-10-14 1 0
8: B 2012-10-15 0 1
9: C 2005-07-25 1 0
10: C 2005-07-31 0 6
如果event
列僅包含0
和1
,則可以通過執行cumsum(event)
創建組變量, cumsum(event)
在event
為1
時創建新組; 然后按此新變量分組,計算累計天數:
df[, days_from_start := cumsum(c(0, diff(as.Date(date)))), by = cumsum(event)]
# ^^^^^^^^^^^^^
df
# id date event days_from_start
# 1: A 2000-01-13 1 0
# 2: A 2000-01-18 0 5
# 3: A 2000-01-25 0 12
# 4: A 2000-01-31 1 0
# 5: B 2012-10-10 1 0
# 6: B 2012-10-11 0 1
# 7: B 2012-10-14 1 0
# 8: B 2012-10-15 0 1
# 9: C 2005-07-25 1 0
#10: C 2005-07-31 0 6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.