是否有一种更有效的方法来填充额外的列而不是'for'循环？

Question

I have a data.table with about 100k rows. 我有一个大约有10万行的data.table。 I am going to simplify this to only 3 columns because that is all that is relevant here. 我将简化为3列，因为这就是所有相关的。

dt <- data.table(indicator = c("x", "y"), 
                 date1 = c("20190111", "20190212", "20190512", "20190723"), 
                 date2 = c("20190105", "20190215", "20190616", "20190623"))

What I want to do is assign either date1 or date2 to a new column, 'final_date' depending on the indicator column. 我想要做的是将date1或date2分配给新列，'final_date'取决于指标列。 If indicator is "x" assign final_date as date1. 如果指标为“x”，则将final_date指定为date1。 If indicator "y" assign final_date as date2. 如果指标“y”将final_date指定为date2。

I am able to do this with a "for" loop and if/else statements, but it takes a few minutes to complete with 100k rows. 我可以使用“for”循环和if / else语句来完成此操作，但需要几分钟才能完成100k行。

for (row in 1:nrow(dt)) {
  if(dt$indicator[row] == "x") {
    dt$final_date[row] <- dt$date1[row]
  } else {
    dt$final_date[row] <- dt$date2[row]
  }
  }

Is there any more efficient way to do this with data.table functionality or anything else? 有没有更有效的方法来执行data.table功能或其他任何事情？

Answer 1

With data.table , I would do something like this: 有了data.table ，我会做这样的事情：

dt[, final_date := ifelse(indicator == "x", date1, date2)]

Really quick and simple! 真快捷简单！ I suspect with a large set of data it will be faster than dplyr as well as the solution you have, as data.table mutates in place rather than creating a copy of the data. 我怀疑使用大量数据时它会比dplyr以及你所拥有的解决方案更快，因为data.table在适当位置发生变异，而不是创建数据副本。

Answer 2

With the dplyr pipeline 使用dplyr管道

> dt%>%mutate(final_data=if_else(indicator=="x",date1,date2))
  indicator    date1    date2 final_data
1         x 20190111 20190105   20190111
2         y 20190212 20190215   20190215
3         x 20190512 20190616   20190512
4         y 20190723 20190623   20190623

Answer 3

Try this: 尝试这个：

# necessary package
library(dplyr)
library(data.table)
# reproduce your data
dt <- data.table(
  indicator = c("x", "y"),
  date1 = c("20190111", "20190212", "20190512", "20190723"),
  date2 = c("20190105", "20190215", "20190616", "20190623")
)
# create your variable final_date
dt[, final_date := case_when(indicator == "x" ~ date1,
                             TRUE ~ date2)]

Hope it helps 希望能帮助到你

是否有一种更有效的方法来填充额外的列而不是'for'循环？

问题描述

3 个解决方案

解决方案1
3 2019-04-04 08:26:28

解决方案2
0 2019-04-04 08:25:06

解决方案3
0 2019-04-04 08:25:08

是否有一种更有效的方法来填充额外的列而不是&#39;for&#39;循环？

问题描述

3 个解决方案

解决方案1 3 2019-04-04 08:26:28

解决方案2 0 2019-04-04 08:25:06

解决方案3 0 2019-04-04 08:25:08

是否有一种更有效的方法来填充额外的列而不是'for'循环？

解决方案1
3 2019-04-04 08:26:28

解决方案2
0 2019-04-04 08:25:06

解决方案3
0 2019-04-04 08:25:08