繁体   English   中英

R:在data.table中的每个ID的开始日期和结束日期之间插入新的每日行

[英]R: Insert new daily rows between start and end date per ID in data.table

我有一个大型数据表,其中包含每个ID事件的开始和结束日期:

library(data.table)
dt = data.table(
    ID = c(1,1,2,2),
    STARTDATE = as.Date(c("2011-10-10","2011-10-13","2011-10-10","2011-10-13"),format = "%Y-%m-%d"),
    ENDDATE = as.Date(c("2011-10-12","2011-10-15","2011-10-12","2011-10-15"),format = "%Y-%m-%d")
)
dt   
>   ID  STARTDATE    ENDDATE
>1:  1 2011-10-10 2011-10-12
>2:  1 2011-10-13 2011-10-15
>3:  2 2011-10-10 2011-10-12
>4:  2 2011-10-13 2011-10-15

我想在时间窗口中为每个ID和日添加行到此数据表,其中包含预期结果,如下所示:

    STARTDATE    ENDDATE ID      DAILY
1: 2011-10-10 2011-10-12  1 2011-10-10
2: 2011-10-10 2011-10-12  1 2011-10-11
3: 2011-10-10 2011-10-12  1 2011-10-12
4: 2011-10-13 2011-10-15  1 2011-10-13
5: 2011-10-13 2011-10-15  1 2011-10-14
6: 2011-10-13 2011-10-15  1 2011-10-15
7: 2011-10-10 2011-10-12  2 2011-10-10
8: 2011-10-10 2011-10-12  2 2011-10-11
9: 2011-10-10 2011-10-12  2 2011-10-12
10: 2011-10-13 2011-10-15  2 2011-10-13
11: 2011-10-13 2011-10-15  2 2011-10-14
12: 2011-10-13 2011-10-15  2 2011-10-15

我的代码如下:

dt[, cbind(.SD, seq(STARTDATE, ENDDATE, 1)), by = list(STARTDATE, ENDDATE)] 

但它不会产生想要的结果:

    STARTDATE    ENDDATE ID         V2
1: 2011-10-10 2011-10-12  1 2011-10-10
2: 2011-10-10 2011-10-12  2 2011-10-11
3: 2011-10-10 2011-10-12  1 2011-10-12
4: 2011-10-13 2011-10-15  1 2011-10-13
5: 2011-10-13 2011-10-15  2 2011-10-14
6: 2011-10-13 2011-10-15  1 2011-10-15
Warnmeldungen:
1: In data.table::data.table(...) :
  Item 1 is of size 2 but maximum size is 3 (recycled leaving remainder of 1 items)
2: In data.table::data.table(...) :
  Item 1 is of size 2 but maximum size is 3 (recycled leaving remainder of 1 items)

它需要的ID的地方,但我不能进入by数据表的一部分。 它给出了另一个错误。 有任何想法吗?

这是一个选项。 请注意,我们可以使用by = 1:nrow(dt)来指定每行的分组,这会导致一个名为nrow的新列。 然后我们可以使用[, nrow := NULL]来删除该列。

library(data.table)

dt2 <- dt[, .(STARTDATE, ENDDATE, ID, 
              DAILY = seq(STARTDATE, ENDDATE, by = 1)), 
          by = 1:nrow(dt)][, nrow := NULL]
print(dt2[])
#      STARTDATE    ENDDATE ID      DAILY
#  1: 2011-10-10 2011-10-12  1 2011-10-10
#  2: 2011-10-10 2011-10-12  1 2011-10-11
#  3: 2011-10-10 2011-10-12  1 2011-10-12
#  4: 2011-10-13 2011-10-15  1 2011-10-13
#  5: 2011-10-13 2011-10-15  1 2011-10-14
#  6: 2011-10-13 2011-10-15  1 2011-10-15
#  7: 2011-10-10 2011-10-12  2 2011-10-10
#  8: 2011-10-10 2011-10-12  2 2011-10-11
#  9: 2011-10-10 2011-10-12  2 2011-10-12
# 10: 2011-10-13 2011-10-15  2 2011-10-13
# 11: 2011-10-13 2011-10-15  2 2011-10-14
# 12: 2011-10-13 2011-10-15  2 2011-10-15

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM