简体   繁体   中英

Product of list of columns in R data.table

I have a large list of column names (variables) of an R data.table and I want to create a column containing the product of these columns.

Example:

col_names <- c("season_1","season_2","season_3")
DT_example <- data.table(id=1:4,
                 season_1=c(1,1,0,0),
                 season_2=c(0,1,1,1),
                 season_3=c(1,0,1,0),
                 product=1)

data.table:

   id season_1 season_2 season_3 product
1:  1        1        0        1       1
2:  2        1        1        1       1
3:  3        0        1        1       1
4:  4        0        1        0       1

The solution I have is using a "for" loop but it is not very efficient:

for(inc in col_names){
  nm1 <- as.symbol(inc)
  DT_example[,product:= product * eval(nm1)]
}

result:

   id season_1 season_2 season_3 product
1:  1        1        0        1       0
2:  2        1        1        0       0
3:  3        1        1        1       1
4:  4        0        1        0       0

Is there a faster way to do this using data.table native syntax?

Here are four options. The first one is by far the most efficient but assumes we are dealing with only zeros and ones.

DT_example[, product := do.call(pmin, .SD), .SDcols = patterns("season")]

DT_example[, product := Reduce(`*`, .SD), .SDcols = patterns("season")]

DT_example[, product := apply(.SD, 1, prod), .SDcols = patterns("season")]

DT_example[, product := melt(.SD, id.vars = "id")[, prod(value), by = id]$V1]

# > DT_example
#    id season_1 season_2 season_3 product
# 1:  1        1        0        1       0
# 2:  2        1        1        1       1
# 3:  3        0        1        1       0
# 4:  4        0        1        0       0

Data:

DT_example <- data.table(
  id=1:4,
  season_1=c(1,1,0,0),
  season_2=c(0,1,1,1),
  season_3=c(1,1,1,0),
  product=1
)

We can use prod grouped by sequence of rows after selecting the columns in .SDcols . With prod , there is na.rm option as well to remove NA elements if needed.

DT_example[,  Product := prod(.SD, na.rm = TRUE), by = 1:nrow(DT_example),
     .SDcols = patterns("season")]

-output

DT_example
#   id season_1 season_2 season_3 product Product
#1:  1        1        0        1       1       0
#2:  2        1        1        1       1       1
#3:  3        0        1        1       1       0
#4:  4        0        1        0       1       0

I think you could use "apply" and "prod" functions:

DT_example$product = apply(DT_example[,2:4], 1, prod)

This is applying the "prod" function (multiplies every element of what ir receives), to every line (defined by the "1" argument, as "2" would be column), of "DT_example[,2:4]".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM