简体   繁体   English

R data.table 中列列表的乘积

[英]Product of list of columns in R data.table

I have a large list of column names (variables) of an R data.table and I want to create a column containing the product of these columns.我有一个 R data.table 的大量列名(变量)列表,我想创建一个包含这些列的乘积的列。

Example:例子:

col_names <- c("season_1","season_2","season_3")
DT_example <- data.table(id=1:4,
                 season_1=c(1,1,0,0),
                 season_2=c(0,1,1,1),
                 season_3=c(1,0,1,0),
                 product=1)

data.table:数据表:

   id season_1 season_2 season_3 product
1:  1        1        0        1       1
2:  2        1        1        1       1
3:  3        0        1        1       1
4:  4        0        1        0       1

The solution I have is using a "for" loop but it is not very efficient:我的解决方案是使用“for”循环,但效率不高:

for(inc in col_names){
  nm1 <- as.symbol(inc)
  DT_example[,product:= product * eval(nm1)]
}

result:结果:

   id season_1 season_2 season_3 product
1:  1        1        0        1       0
2:  2        1        1        0       0
3:  3        1        1        1       1
4:  4        0        1        0       0

Is there a faster way to do this using data.table native syntax?是否有使用 data.table 本机语法执行此操作的更快方法?

Here are four options.这里有四个选项。 The first one is by far the most efficient but assumes we are dealing with only zeros and ones.第一个是迄今为止最有效的,但假设我们只处理零和一。

DT_example[, product := do.call(pmin, .SD), .SDcols = patterns("season")]

DT_example[, product := Reduce(`*`, .SD), .SDcols = patterns("season")]

DT_example[, product := apply(.SD, 1, prod), .SDcols = patterns("season")]

DT_example[, product := melt(.SD, id.vars = "id")[, prod(value), by = id]$V1]

# > DT_example
#    id season_1 season_2 season_3 product
# 1:  1        1        0        1       0
# 2:  2        1        1        1       1
# 3:  3        0        1        1       0
# 4:  4        0        1        0       0

Data:数据:

DT_example <- data.table(
  id=1:4,
  season_1=c(1,1,0,0),
  season_2=c(0,1,1,1),
  season_3=c(1,1,1,0),
  product=1
)

We can use prod grouped by sequence of rows after selecting the columns in .SDcols ..SDcols选择列后,我们可以使用按行顺序分组的prod With prod , there is na.rm option as well to remove NA elements if needed.使用prod ,如果需要,还有na.rm选项可以删除NA元素。

DT_example[,  Product := prod(.SD, na.rm = TRUE), by = 1:nrow(DT_example),
     .SDcols = patterns("season")]

-output -输出

DT_example
#   id season_1 season_2 season_3 product Product
#1:  1        1        0        1       1       0
#2:  2        1        1        1       1       1
#3:  3        0        1        1       1       0
#4:  4        0        1        0       1       0

I think you could use "apply" and "prod" functions:我认为您可以使用“应用”和“产品”功能:

DT_example$product = apply(DT_example[,2:4], 1, prod)

This is applying the "prod" function (multiplies every element of what ir receives), to every line (defined by the "1" argument, as "2" would be column), of "DT_example[,2:4]".这是将“prod”函数(将 ir 接收到的每个元素相乘)应用于“DT_example[,2:4]”的每一行(由“1”参数定义,因为“2”将是列)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM