简体   繁体   中英

R: Insert new daily rows between start and end date per ID in data.table

I have a large data table that contains start and end dates of events per ID:

library(data.table)
dt = data.table(
    ID = c(1,1,2,2),
    STARTDATE = as.Date(c("2011-10-10","2011-10-13","2011-10-10","2011-10-13"),format = "%Y-%m-%d"),
    ENDDATE = as.Date(c("2011-10-12","2011-10-15","2011-10-12","2011-10-15"),format = "%Y-%m-%d")
)
dt   
>   ID  STARTDATE    ENDDATE
>1:  1 2011-10-10 2011-10-12
>2:  1 2011-10-13 2011-10-15
>3:  2 2011-10-10 2011-10-12
>4:  2 2011-10-13 2011-10-15

I would like to add rows for each ID and day in the time windows to the this data table with the expected result as follows:

    STARTDATE    ENDDATE ID      DAILY
1: 2011-10-10 2011-10-12  1 2011-10-10
2: 2011-10-10 2011-10-12  1 2011-10-11
3: 2011-10-10 2011-10-12  1 2011-10-12
4: 2011-10-13 2011-10-15  1 2011-10-13
5: 2011-10-13 2011-10-15  1 2011-10-14
6: 2011-10-13 2011-10-15  1 2011-10-15
7: 2011-10-10 2011-10-12  2 2011-10-10
8: 2011-10-10 2011-10-12  2 2011-10-11
9: 2011-10-10 2011-10-12  2 2011-10-12
10: 2011-10-13 2011-10-15  2 2011-10-13
11: 2011-10-13 2011-10-15  2 2011-10-14
12: 2011-10-13 2011-10-15  2 2011-10-15

My code looks as follows:

dt[, cbind(.SD, seq(STARTDATE, ENDDATE, 1)), by = list(STARTDATE, ENDDATE)] 

but it does not generate the wanted result:

    STARTDATE    ENDDATE ID         V2
1: 2011-10-10 2011-10-12  1 2011-10-10
2: 2011-10-10 2011-10-12  2 2011-10-11
3: 2011-10-10 2011-10-12  1 2011-10-12
4: 2011-10-13 2011-10-15  1 2011-10-13
5: 2011-10-13 2011-10-15  2 2011-10-14
6: 2011-10-13 2011-10-15  1 2011-10-15
Warnmeldungen:
1: In data.table::data.table(...) :
  Item 1 is of size 2 but maximum size is 3 (recycled leaving remainder of 1 items)
2: In data.table::data.table(...) :
  Item 1 is of size 2 but maximum size is 3 (recycled leaving remainder of 1 items)

It needs the ID somewhere but I cannot enter it into the by part of the data table. It gives another error. Any ideas?

Here is an option. Notice that we can use by = 1:nrow(dt) to specify the grouping is for each row, which leads to a new column called nrow . We can then use [, nrow := NULL] to remove that column.

library(data.table)

dt2 <- dt[, .(STARTDATE, ENDDATE, ID, 
              DAILY = seq(STARTDATE, ENDDATE, by = 1)), 
          by = 1:nrow(dt)][, nrow := NULL]
print(dt2[])
#      STARTDATE    ENDDATE ID      DAILY
#  1: 2011-10-10 2011-10-12  1 2011-10-10
#  2: 2011-10-10 2011-10-12  1 2011-10-11
#  3: 2011-10-10 2011-10-12  1 2011-10-12
#  4: 2011-10-13 2011-10-15  1 2011-10-13
#  5: 2011-10-13 2011-10-15  1 2011-10-14
#  6: 2011-10-13 2011-10-15  1 2011-10-15
#  7: 2011-10-10 2011-10-12  2 2011-10-10
#  8: 2011-10-10 2011-10-12  2 2011-10-11
#  9: 2011-10-10 2011-10-12  2 2011-10-12
# 10: 2011-10-13 2011-10-15  2 2011-10-13
# 11: 2011-10-13 2011-10-15  2 2011-10-14
# 12: 2011-10-13 2011-10-15  2 2011-10-15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM