如何在R中的data.table中插入连续行（给出的示例）？

Question

df 是 data.table 和 df_expected 是所需的 data.table 。 我想添加从 0 到 23 的小时列，对于新添加的小时数，访问值将填充为 0。

df<-data.table(customer=c("x","x","x","y","y"),location_id=c(1,1,1,2,3),hour=c(2,5,7,0,4),visits=c(40,50,60,70,80))






df_expected<-data.table(customer=c("x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x",
                               "y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y",
                               "y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y"),

                    location_id=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
                                  2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
                                  3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3),

                    hour=c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
                           0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
                           0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23),

                    visits=c(0,0,40,0,0,50,0,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                             70,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                             0,0,0,0,80,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))

这是我试图获得结果的方法，但没有奏效

df1<-df[,':='(hour=seq(0:23)),by=(customer)]
Error in `[.data.table`(df, , `:=`(hour = seq(0L:23L)), by = (customer)) : 
Type of RHS ('integer') must match LHS ('double'). To check and coerce would impact 
performance too much for the fastest cases. Either change the type of the target column, or 
coerce the RHS of := yourself (e.g. by using 1L instead of 1)

Answer 1

这是一种创建目标然后使用连接添加访问信息的方法。 ifelse语句只是帮助清理合并中的NA 。 您也可以将它们保留在新的 data.table 中并用:=替换它们。

target <- data.table(
  customer = rep(unique(df$customer), each = 24),
  hour = 0:23)

df_join <- df[target, on = c("customer", "hour"), 
   .(customer, hour, visits = ifelse(is.na(visits), 0, visits))
   ]

all.equal(df_expected, df_join)

编辑：

这解决了包含location_id列的请求。 一种方法是在创建目标时使用by=location 。 我还添加了 chinsoon12 的答案中的一些代码。

target <- df[ , .("customer" = rep(unique(customer), each = 24L),
                  "hour" = rep(0L:23L, times = uniqueN(customer))),
              by = location_id]

df_join <- df[target, on = .NATURAL, 
              .(customer, location_id, hour, visits = fcoalesce(visits, 0))]

all.equal(df_expected, df_join)

Answer 2

使用CJ生成您的 Universe 的另一个选项， on=.NATURAL用于加入同名列，以及fcoalesce来处理 NA：

df[CJ(customer, hour=0L:23L, unique=TRUE), on=.NATURAL, allow.cartesian=TRUE, 
    .(customer=i.customer, hour=i.hour, visits=fcoalesce(visits, 0))]

Answer 3

这是一个for循环答案。

df_final <- data.table()
for(i in seq(24)){
  if(i %in% df[,hour]){
    a <- df[hour==i]
  }else{
    a <- data.table(customer="x", hour=i, visits=0)}

  df_final <- rbind(df_final, a)
}
df_final

您可以将其包装在另一个 for 循环中，让您的多个客户 x、y 等（以下循环不是很干净，但可以完成工作）。

df_final <- data.table()

for(j in unique(df[,customer])){

  for(i in seq(24)){

    if(i %in% df[,hour]){
      if(df[hour==i,customer] %in% j){
        a <- df[hour==i]
      }else{
        a <- data.table(customer=j, hour=i, visits=0)
      }
    }else{
      a <- data.table(customer=j, hour=i, visits=0)
    }

    df_final <- rbind(df_final, a)
  }
}

df_final

如何在R中的data.table中插入连续行（给出的示例）？

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-02-20 13:41:35

解决方案2
1 2020-02-21 00:33:12

解决方案3
0 2020-02-20 13:07:17

如何在R中的data.table中插入连续行（给出的示例）？

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-02-20 13:41:35

解决方案2 1 2020-02-21 00:33:12

解决方案3 0 2020-02-20 13:07:17

解决方案1
2 已采纳 2020-02-20 13:41:35

解决方案2
1 2020-02-21 00:33:12

解决方案3
0 2020-02-20 13:07:17