R Data.table filtering rows based a vector of columns greater than value

Question

I have a variable list of column names in a data.table, and would like to apply the same filter to all of them to subset the rows of my table (ie give me the rows where all the columns in the list are >= 5)

DT = data.table(
  ID = c("b","b","b","a","a","c"),
  a = 1:6,
  b = 7:12,
  c = 13:18
)
> DT
   ID a  b  c
1:  b 1  7 13
2:  b 2  8 14
3:  b 3  9 15
4:  a 4 10 16
5:  a 5 11 17
6:  c 6 12 18

cols= c("a", "b", "c")
DT[.SD >= 5, , .SDcols=cols] # something like this?
   ID a  b  c
1:  a 5 11 17
2:  c 6 12 18

Answer 1

If we need to have the conditions met for all the columns, create a list of logical vectors and then Reduce it to a single logical vector

DT[DT[, Reduce(`&`, lapply(.SD, `>=`, 5)), .SDcols = cols]]
#   ID a  b  c
#1:  a 5 11 17
#2:  c 6 12 18

Or another option with rowSums

DT[ DT[, rowSums(.SD >= 5) == length(cols), .SDcols = cols]]

NOTE: Both options are vectorized and are efficient

Benchmarks

DT1 <- DT[rep(seq_len(nrow(DT)), 1e6)]



system.time(DT1[ DT1[, rowSums(.SD >= 5) == length(cols), .SDcols = cols]])
#   user  system elapsed 
#  0.464   0.127   0.555 

system.time(DT1[DT1[, Reduce(`&`, lapply(.SD, `>=`, 5)), .SDcols = cols]])
#   user  system elapsed 
#  0.134   0.022   0.150 



system.time(DT1[ DT1[, apply(.SD >= 5, 1, all), .SDcols=cols], ])
#     user  system elapsed 
#  6.636   0.087   6.687

Answer 2

DT[ DT[, apply(.SD >= 5, 1, all), .SDcols=cols], ]
#    ID a  b  c
# 1:  a 5 11 17
# 2:  c 6 12 18

Answer 3

you could just use.SDcols = is.numeric if you were looking at all the numeric columns. saves a step

R Data.table filtering rows based a vector of columns greater than value

Question

3 answers

solution1
2 ACCPTED 2020-04-03 21:53:20

Benchmarks

solution2
1 2020-04-03 22:24:02

solution3
0 2022-01-18 12:23:24

R Data.table filtering rows based a vector of columns greater than value

Question

3 answers

solution1 2 ACCPTED 2020-04-03 21:53:20

Benchmarks

solution2 1 2020-04-03 22:24:02

solution3 0 2022-01-18 12:23:24

solution1
2 ACCPTED 2020-04-03 21:53:20

solution2
1 2020-04-03 22:24:02

solution3
0 2022-01-18 12:23:24