R - 选择满足多个条件的矩阵行的最快方法

Question

This is an extension to the question on returning the rows of a matrix that meet a condition in R. Say I have the matrix: 这是关于返回满足 R 条件的矩阵行的问题的扩展。说我有矩阵：

       one two three four
 [1,]   1   6    11   16
 [2,]   2   7    12   17
 [3,]   3   8    11   18
 [4,]   4   9    11   19
 [5,]   5  10    15   20
 [6,]   1   6    15   20
 [7,]   5   7    12   20

I want to return all rows, where matrix$two == 7 AND matrix$three == 12 as fast as possible. 我想要返回所有行，其中matrix$two == 7 AND matrix$three == 12尽可能快。 This is the way I know to do it: 这是我所知道的方式：

 out <- mat[mat$two == 7,]
 final_out <- out[out$three == 12, ]

There should obviously be a method to get the contents of final_out in a one-liner, something like: final_out <- which(mat$two == 7 && mat$three == 12) that is faster and more succinct than the two line of codes above. 显然应该有一种方法可以在final_out中获取final_out的内容，例如： final_out <- which(mat$two == 7 && mat$three == 12)比两行更快更简洁上面的代码。

What is the fastest R code to return this multiple condition matrix query? 返回此多条件矩阵查询的最快R代码是什么？

Answer 1

Just use [ subsetting with logical comparison... 只需使用[通过逻辑比较进行子集化...

#  Reproducible data
set.seed(1)
m <- matrix( sample(12,28,repl=T) , 7 , 4 )
     [,1] [,2] [,3] [,4]
[1,]    4    8   10    3
[2,]    5    8    6    8
[3,]    7    1    9    2
[4,]   11    3   12    4
[5,]    3    3    5    5
[6,]   11    9   10    1
[7,]   12    5   12    5


#  Subset according to condition
m[ m[,2] == 3 & m[,3] == 12 , ]
[1] 11  3 12  4

Answer 2

UPDATE USING MICROBENCHMARK: 使用MICROBENCHMARK更新：

Using benchmark gives the opposite answer. 使用基准测试给出相反的答案。 It seems the answer given by @SimonO101 provides a slightly faster implementation. 似乎@ SimonO101给出的答案提供了稍微快一点的实现。

require(microbenchmark)
set.seed(1)
m <- matrix( sample(12,100,repl=T) , 25 , 4 )
colnames(m) <- c("one","two","three","four")

bench1 <- microbenchmark(m[which(m[,'two']==7 & m[,'three'] == 12, arr.ind = TRUE),])
summary(bench1$time)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   7700    8750    9449    9688    9800   22400

bench2 <- microbenchmark(m[ m[,2] == 3 & m[,3] == 12 , ])
summary(bench2$time)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   6300    7350    7351    7599    8050   15400

OLD ANSWER: 老答案：

Combining the answers given by @Jiber and @SimonO101 gives a slightly faster answer, at least on my computer. 结合@Jiber和@ SimonO101给出的答案给出了一个稍快的答案，至少在我的电脑上。

I made the matrix much larger to separate the computation times. 我使矩阵更大以分离计算时间。

set.seed(1)
m <- matrix( sample(12,1000000000,repl=T) , 1e8 , 10 )
colnames(m) <- c("one","two","three","four","five","six","seven","eight","nine","ten")

system.time(m[which(m[,'two']==7 & m[,'three'] == 12, arr.ind = TRUE),])
   user  system elapsed 
   6.49    1.58    8.06 
system.time(m[ m[,2] == 3 & m[,3] == 12 , ])
   user  system elapsed 
   8.23    1.29    9.52

This obviously assumes the matrix columns are named. 这显然假定矩阵列已命名。

Answer 3

Use which with arr.ind=TRUE as in: 使用which与arr.ind=TRUE ，如下所示：

> mat[which(mat[,"two"]==7 & mat[,"three"] == 12, arr.ind = TRUE),]
  one two three four
2   2   7    12   17
7   5   7    12   20

Answer 4

If you have a lot of rows, still it would be better to subset first, as you can see in the following code 如果你有很多行，那么最好先进行子集化，如下面的代码所示

set.seed(1)
m <- matrix( sample(12,28,repl=T) , 12e6 , 4 )

#  Subset according to condition
microbenchmark(sample0=m[ m[,2] == 3 & m[,3] == 12 , ],times = 10L)

microbenchmark(sample1=m[ m[,2] == 3, ],
           sample2= sample1[sample1[,3] == 12, ],times = 10L)

The results below: 结果如下：

microbenchmark(sample0=m[ m[,2] == 3 & m[,3] == 12 , ],times = 10L)
Unit: milliseconds
expr        min         lq        mean     median         uq        max neval
sample0 342.085212 347.333083 381.6039635 349.920741 375.383425 584.068743    10
microbenchmark(sample1=m[ m[,2] == 3, ],
              sample2= sample1[sample1[,3] == 12, ],times = 10L)
Unit: milliseconds
expr        min         lq        mean      median         uq        max neval cld
 sample1 188.647995 189.832552 215.9355769 194.2375715 199.118962 404.631420    10   b
 sample2   5.097811   5.262028   5.3260160   5.2868025   5.401471   5.571351    10  a

Answer 5

the absolute fastest way in R will be ifelse which unlike if allows for vectorized conditionals. R中绝对最快的方式将是ifelse ，与if允许向量化条件不同。 You can also cache vectors of conditionals (eg isSeven <- mat[, 'two'] == 7 ) and use/reuse those later. 您还可以缓存条件的向量（例如， isSeven <- mat[, 'two'] == 7 ）并稍后使用/重用这些条件。

I don't have a reproducible example here but I would do something like 我这里没有可重复的例子，但我会做类似的事情

ifelse(mat[, 'two'] == 7 & mat[, 'three'] == 12, "both", "not both")

You can plop other conditionals in there or have it return anything that will result in a conformable vector. 你可以在那里找到其他条件，或者让它返回任何会导致一致的向量的条件。

R - 选择满足多个条件的矩阵行的最快方法

问题描述

5 个解决方案

解决方案1
12 已采纳 2013-08-08 14:27:49

解决方案2
3 2013-08-08 14:45:01

解决方案3
1 2013-08-08 14:26:46

解决方案4
1 2018-04-25 13:29:38

解决方案5
-2 2013-08-08 14:28:33

R - 选择满足多个条件的矩阵行的最快方法

问题描述

5 个解决方案

解决方案1 12 已采纳 2013-08-08 14:27:49

解决方案2 3 2013-08-08 14:45:01

解决方案3 1 2013-08-08 14:26:46

解决方案4 1 2018-04-25 13:29:38

解决方案5 -2 2013-08-08 14:28:33

解决方案1
12 已采纳 2013-08-08 14:27:49

解决方案2
3 2013-08-08 14:45:01

解决方案3
1 2013-08-08 14:26:46

解决方案4
1 2018-04-25 13:29:38

解决方案5
-2 2013-08-08 14:28:33