繁体   English   中英

如何在R中的data.table中插入连续行(给出的示例)?

[英]how to insert sequential rows in data.table in R (Example given)?

df 是 data.table 和 df_expected 是所需的 data.table 。 我想添加从 0 到 23 的小时列,对于新添加的小时数,访问值将填充为 0。

df<-data.table(customer=c("x","x","x","y","y"),location_id=c(1,1,1,2,3),hour=c(2,5,7,0,4),visits=c(40,50,60,70,80))






df_expected<-data.table(customer=c("x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x",
                               "y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y",
                               "y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y"),

                    location_id=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
                                  2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
                                  3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3),

                    hour=c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
                           0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
                           0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23),

                    visits=c(0,0,40,0,0,50,0,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                             70,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                             0,0,0,0,80,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))

这是我试图获得结果的方法,但没有奏效

df1<-df[,':='(hour=seq(0:23)),by=(customer)]
Error in `[.data.table`(df, , `:=`(hour = seq(0L:23L)), by = (customer)) : 
Type of RHS ('integer') must match LHS ('double'). To check and coerce would impact 
performance too much for the fastest cases. Either change the type of the target column, or 
coerce the RHS of := yourself (e.g. by using 1L instead of 1)

这是一种创建目标然后使用连接添加访问信息的方法。 ifelse语句只是帮助清理合并中的NA 您也可以将它们保留在新的 data.table 中并用:=替换它们。

target <- data.table(
  customer = rep(unique(df$customer), each = 24),
  hour = 0:23)

df_join <- df[target, on = c("customer", "hour"), 
   .(customer, hour, visits = ifelse(is.na(visits), 0, visits))
   ]

all.equal(df_expected, df_join)

编辑:

这解决了包含location_id列的请求。 一种方法是在创建目标时使用by=location 我还添加了 chinsoon12 的答案中的一些代码。

target <- df[ , .("customer" = rep(unique(customer), each = 24L),
                  "hour" = rep(0L:23L, times = uniqueN(customer))),
              by = location_id]

df_join <- df[target, on = .NATURAL, 
              .(customer, location_id, hour, visits = fcoalesce(visits, 0))]

all.equal(df_expected, df_join)

使用CJ生成您的 Universe 的另一个选项, on=.NATURAL用于加入同名列,以及fcoalesce来处理 NA:

df[CJ(customer, hour=0L:23L, unique=TRUE), on=.NATURAL, allow.cartesian=TRUE, 
    .(customer=i.customer, hour=i.hour, visits=fcoalesce(visits, 0))]

这是一个for循环答案。

df_final <- data.table()
for(i in seq(24)){
  if(i %in% df[,hour]){
    a <- df[hour==i]
  }else{
    a <- data.table(customer="x", hour=i, visits=0)}

  df_final <- rbind(df_final, a)
}
df_final

您可以将其包装在另一个 for 循环中,让您的多个客户 x、y 等(以下循环不是很干净,但可以完成工作)。

df_final <- data.table()

for(j in unique(df[,customer])){

  for(i in seq(24)){

    if(i %in% df[,hour]){
      if(df[hour==i,customer] %in% j){
        a <- df[hour==i]
      }else{
        a <- data.table(customer=j, hour=i, visits=0)
      }
    }else{
      a <- data.table(customer=j, hour=i, visits=0)
    }

    df_final <- rbind(df_final, a)
  }
}

df_final

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM