[英]R - fastest way to select the rows of a matrix that satisfy multiple conditions
This is an extension to the question on returning the rows of a matrix that meet a condition in R. Say I have the matrix: 这是关于返回满足 R 条件的矩阵行的问题的扩展。说我有矩阵:
one two three four
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 11 18
[4,] 4 9 11 19
[5,] 5 10 15 20
[6,] 1 6 15 20
[7,] 5 7 12 20
I want to return all rows, where matrix$two == 7
AND matrix$three == 12
as fast as possible. 我想要返回所有行,其中
matrix$two == 7
AND matrix$three == 12
尽可能快。 This is the way I know to do it: 这是我所知道的方式:
out <- mat[mat$two == 7,]
final_out <- out[out$three == 12, ]
There should obviously be a method to get the contents of final_out
in a one-liner, something like: final_out <- which(mat$two == 7 && mat$three == 12)
that is faster and more succinct than the two line of codes above. 显然应该有一种方法可以在
final_out
中获取final_out
的内容,例如: final_out <- which(mat$two == 7 && mat$three == 12)
比两行更快更简洁上面的代码。
What is the fastest R code to return this multiple condition matrix query? 返回此多条件矩阵查询的最快R代码是什么?
Just use [
subsetting with logical comparison... 只需使用
[
通过逻辑比较进行子集化...
# Reproducible data
set.seed(1)
m <- matrix( sample(12,28,repl=T) , 7 , 4 )
[,1] [,2] [,3] [,4]
[1,] 4 8 10 3
[2,] 5 8 6 8
[3,] 7 1 9 2
[4,] 11 3 12 4
[5,] 3 3 5 5
[6,] 11 9 10 1
[7,] 12 5 12 5
# Subset according to condition
m[ m[,2] == 3 & m[,3] == 12 , ]
[1] 11 3 12 4
UPDATE USING MICROBENCHMARK: 使用MICROBENCHMARK更新:
Using benchmark gives the opposite answer. 使用基准测试给出相反的答案。 It seems the answer given by @SimonO101 provides a slightly faster implementation.
似乎@ SimonO101给出的答案提供了稍微快一点的实现。
require(microbenchmark)
set.seed(1)
m <- matrix( sample(12,100,repl=T) , 25 , 4 )
colnames(m) <- c("one","two","three","four")
bench1 <- microbenchmark(m[which(m[,'two']==7 & m[,'three'] == 12, arr.ind = TRUE),])
summary(bench1$time)
Min. 1st Qu. Median Mean 3rd Qu. Max.
7700 8750 9449 9688 9800 22400
bench2 <- microbenchmark(m[ m[,2] == 3 & m[,3] == 12 , ])
summary(bench2$time)
Min. 1st Qu. Median Mean 3rd Qu. Max.
6300 7350 7351 7599 8050 15400
OLD ANSWER: 老答案:
Combining the answers given by @Jiber and @SimonO101 gives a slightly faster answer, at least on my computer. 结合@Jiber和@ SimonO101给出的答案给出了一个稍快的答案,至少在我的电脑上。
I made the matrix much larger to separate the computation times. 我使矩阵更大以分离计算时间。
set.seed(1)
m <- matrix( sample(12,1000000000,repl=T) , 1e8 , 10 )
colnames(m) <- c("one","two","three","four","five","six","seven","eight","nine","ten")
system.time(m[which(m[,'two']==7 & m[,'three'] == 12, arr.ind = TRUE),])
user system elapsed
6.49 1.58 8.06
system.time(m[ m[,2] == 3 & m[,3] == 12 , ])
user system elapsed
8.23 1.29 9.52
This obviously assumes the matrix columns are named. 这显然假定矩阵列已命名。
Use which
with arr.ind=TRUE
as in: 使用
which
与arr.ind=TRUE
,如下所示:
> mat[which(mat[,"two"]==7 & mat[,"three"] == 12, arr.ind = TRUE),]
one two three four
2 2 7 12 17
7 5 7 12 20
If you have a lot of rows, still it would be better to subset first, as you can see in the following code 如果你有很多行,那么最好先进行子集化,如下面的代码所示
set.seed(1)
m <- matrix( sample(12,28,repl=T) , 12e6 , 4 )
# Subset according to condition
microbenchmark(sample0=m[ m[,2] == 3 & m[,3] == 12 , ],times = 10L)
microbenchmark(sample1=m[ m[,2] == 3, ],
sample2= sample1[sample1[,3] == 12, ],times = 10L)
The results below: 结果如下:
microbenchmark(sample0=m[ m[,2] == 3 & m[,3] == 12 , ],times = 10L)
Unit: milliseconds
expr min lq mean median uq max neval
sample0 342.085212 347.333083 381.6039635 349.920741 375.383425 584.068743 10
microbenchmark(sample1=m[ m[,2] == 3, ],
sample2= sample1[sample1[,3] == 12, ],times = 10L)
Unit: milliseconds
expr min lq mean median uq max neval cld
sample1 188.647995 189.832552 215.9355769 194.2375715 199.118962 404.631420 10 b
sample2 5.097811 5.262028 5.3260160 5.2868025 5.401471 5.571351 10 a
the absolute fastest way in R will be ifelse
which unlike if
allows for vectorized conditionals. R中绝对最快的方式将是
ifelse
,与if
允许向量化条件不同。 You can also cache vectors of conditionals (eg isSeven <- mat[, 'two'] == 7
) and use/reuse those later. 您还可以缓存条件的向量(例如,
isSeven <- mat[, 'two'] == 7
)并稍后使用/重用这些条件。
I don't have a reproducible example here but I would do something like 我这里没有可重复的例子,但我会做类似的事情
ifelse(mat[, 'two'] == 7 & mat[, 'three'] == 12, "both", "not both")
You can plop other conditionals in there or have it return anything that will result in a conformable vector. 你可以在那里找到其他条件,或者让它返回任何会导致一致的向量的条件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.