[英]Identify groups of n consecutive numbers in a data.table field in a group
[英]Group data.table dates into groups by consecutive time intervals (split by gaps)
我有一個data.table
,其中包含針對不同客戶(“客戶”)的許多事件,並希望在同一客戶的每個缺口(“缺失事件”)處分割事件 。
例如 假設我有每月的事件數據,而一個或多個月的丟失事件是一個“空白”,而連續幾個月的事件屬於同一組:
library(data.table)
library(lubridate) # for ymd()
dt <- data.table(client.no = c(rep("Client_A", 3), rep("Client_B", 5), rep("Client_C", 2)),
event.date = ymd(20160101, 20160201, 20160301, 20151201, 20160101, 20160301, 20160501, 20160601, 20140701, 20150101))
與dt
client.no event.date
1: Client_A 2016-01-01
2: Client_A 2016-02-01
3: Client_A 2016-03-01
4: Client_B 2015-12-01
5: Client_B 2016-01-01
6: Client_B 2016-03-01
7: Client_B 2016-05-01
8: Client_B 2016-06-01
9: Client_C 2014-07-01
10: Client_C 2015-01-01
結果應該是對於同一組的每一行都相同的組號,例如:
client.no event.date group.no
1: Client_A 2016-01-01 1
2: Client_A 2016-02-01 1
3: Client_A 2016-03-01 1
4: Client_B 2015-12-01 1
5: Client_B 2016-01-01 1
6: Client_B 2016-03-01 2
7: Client_B 2016-05-01 3
8: Client_B 2016-06-01 3
9: Client_C 2014-07-01 1
10: Client_C 2015-01-01 2
不需要為每個客戶端將組號重置為一個(但這很好)。
您可以假定事件是在每個客戶端中排序的,並且同一客戶端中沒有重復的事件日期。
您可以使用cumsum
:
dt[,z:=cumsum(c(1,diff(event.date)>31)),by=client.no]
輸出:
client.no event.date z
1: Client_A 2016-01-01 1
2: Client_A 2016-02-01 1
3: Client_A 2016-03-01 1
4: Client_B 2015-12-01 1
5: Client_B 2016-01-01 1
6: Client_B 2016-03-01 2
7: Client_B 2016-05-01 3
8: Client_B 2016-06-01 3
9: Client_C 2014-07-01 1
10: Client_C 2015-01-01 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.