简体   繁体   中英

Group data.table dates into groups by consecutive time intervals (split by gaps)

I have a data.table with many events for different customers ("clients") and want to split the events at each gap ("missing event") of the same customer .

E. g. suppose I have monthly event data and a missing event for one or more months is a "gap" while events for several successive months belong to the same group:

library(data.table)
library(lubridate)   # for ymd()
dt <- data.table(client.no = c(rep("Client_A", 3), rep("Client_B", 5), rep("Client_C", 2)),
                 event.date = ymd(20160101, 20160201, 20160301, 20151201, 20160101, 20160301, 20160501, 20160601, 20140701, 20150101))

With dt

    client.no event.date
 1:  Client_A 2016-01-01
 2:  Client_A 2016-02-01
 3:  Client_A 2016-03-01
 4:  Client_B 2015-12-01
 5:  Client_B 2016-01-01
 6:  Client_B 2016-03-01
 7:  Client_B 2016-05-01
 8:  Client_B 2016-06-01
 9:  Client_C 2014-07-01
10:  Client_C 2015-01-01

The result shall be a group number that is the same for each row of the same group, eg:

    client.no event.date group.no
 1:  Client_A 2016-01-01        1
 2:  Client_A 2016-02-01        1
 3:  Client_A 2016-03-01        1
 4:  Client_B 2015-12-01        1
 5:  Client_B 2016-01-01        1
 6:  Client_B 2016-03-01        2
 7:  Client_B 2016-05-01        3
 8:  Client_B 2016-06-01        3
 9:  Client_C 2014-07-01        1
10:  Client_C 2015-01-01        2

It is not required that the group number is reset to one for each client (but would be nice).

You can assume that the events are ordered within each client and that there are no duplicated event dates within the same client.

You can use cumsum :

dt[,z:=cumsum(c(1,diff(event.date)>31)),by=client.no]

Output:

   client.no event.date z
 1:  Client_A 2016-01-01 1
 2:  Client_A 2016-02-01 1
 3:  Client_A 2016-03-01 1
 4:  Client_B 2015-12-01 1
 5:  Client_B 2016-01-01 1
 6:  Client_B 2016-03-01 2
 7:  Client_B 2016-05-01 3
 8:  Client_B 2016-06-01 3
 9:  Client_C 2014-07-01 1
10:  Client_C 2015-01-01 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM