简体   繁体   English

如何在R中的data.table中插入连续行(给出的示例)?

[英]how to insert sequential rows in data.table in R (Example given)?

df is data.table and df_expected is desired data.table . df 是 data.table 和 df_expected 是所需的 data.table 。 I want to add hour column from 0 to 23 and visits value would be filled as 0 for hours newly added .我想添加从 0 到 23 的小时列,对于新添加的小时数,访问值将填充为 0。

df<-data.table(customer=c("x","x","x","y","y"),location_id=c(1,1,1,2,3),hour=c(2,5,7,0,4),visits=c(40,50,60,70,80))






df_expected<-data.table(customer=c("x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x","x",
                               "y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y",
                               "y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y","y"),

                    location_id=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
                                  2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
                                  3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3),

                    hour=c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
                           0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,
                           0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23),

                    visits=c(0,0,40,0,0,50,0,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                             70,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                             0,0,0,0,80,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))

This is what I tried to obtain my result , but it did not work这是我试图获得结果的方法,但没有奏效

df1<-df[,':='(hour=seq(0:23)),by=(customer)]
Error in `[.data.table`(df, , `:=`(hour = seq(0L:23L)), by = (customer)) : 
Type of RHS ('integer') must match LHS ('double'). To check and coerce would impact 
performance too much for the fastest cases. Either change the type of the target column, or 
coerce the RHS of := yourself (e.g. by using 1L instead of 1)

Here's an approach that creates the target and then uses a join to add in the visits information.这是一种创建目标然后使用连接添加访问信息的方法。 The ifelse statement just helps up clean up the NA from the merge. ifelse语句只是帮助清理合并中的NA You could also leave them in and replace them with := in the new data.table.您也可以将它们保留在新的 data.table 中并用:=替换它们。

target <- data.table(
  customer = rep(unique(df$customer), each = 24),
  hour = 0:23)

df_join <- df[target, on = c("customer", "hour"), 
   .(customer, hour, visits = ifelse(is.na(visits), 0, visits))
   ]

all.equal(df_expected, df_join)

Edit:编辑:

This addresses the request to include the location_id column.这解决了包含location_id列的请求。 One way to do this is with by=location in the creation of the target.一种方法是在创建目标时使用by=location I've also added in some of the code from chinsoon12's answer.我还添加了 chinsoon12 的答案中的一些代码。

target <- df[ , .("customer" = rep(unique(customer), each = 24L),
                  "hour" = rep(0L:23L, times = uniqueN(customer))),
              by = location_id]

df_join <- df[target, on = .NATURAL, 
              .(customer, location_id, hour, visits = fcoalesce(visits, 0))]

all.equal(df_expected, df_join)

Another option using CJ to generate your universe, on=.NATURAL for joining on identically named columns, and fcoalesce to handle NAs:使用CJ生成您的 Universe 的另一个选项, on=.NATURAL用于加入同名列,以及fcoalesce来处理 NA:

df[CJ(customer, hour=0L:23L, unique=TRUE), on=.NATURAL, allow.cartesian=TRUE, 
    .(customer=i.customer, hour=i.hour, visits=fcoalesce(visits, 0))]

here's a for-loop answer.这是一个for循环答案。

df_final <- data.table()
for(i in seq(24)){
  if(i %in% df[,hour]){
    a <- df[hour==i]
  }else{
    a <- data.table(customer="x", hour=i, visits=0)}

  df_final <- rbind(df_final, a)
}
df_final

You can wrap this in another for-loop to have your multiple customers x, y, etc. (the following loop isnt very clean but gets the job done).您可以将其包装在另一个 for 循环中,让您的多个客户 x、y 等(以下循环不是很干净,但可以完成工作)。

df_final <- data.table()

for(j in unique(df[,customer])){

  for(i in seq(24)){

    if(i %in% df[,hour]){
      if(df[hour==i,customer] %in% j){
        a <- df[hour==i]
      }else{
        a <- data.table(customer=j, hour=i, visits=0)
      }
    }else{
      a <- data.table(customer=j, hour=i, visits=0)
    }

    df_final <- rbind(df_final, a)
  }
}

df_final

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM