[英]Looking for a more efficient way to filter an array
I have two arrays I obtained from krige()
, values
and variances
with a couple of million entries.我有两个 arrays 我从
krige()
获得, values
和variances
有几百万个条目。 Those two arrays are of the same length and match 1:1 with each other.这两个arrays长度相同,1:1匹配。 I want to remove values that have a variance above a certain threshold.
我想删除方差超过特定阈值的值。 I don't really need to modify
values
in-place, generating a third array would be fine.我真的不需要就地修改
values
,生成第三个数组就可以了。
The following code works fine:以下代码工作正常:
for (i in 1:length(values)) {
if (variances[i] > 0.8) {
values[i] = NA
}
}
Unfortunately, it is very slow and use only a single processor core.不幸的是,它非常慢并且只使用一个处理器内核。 Do I really need to handle the parallel calculations manually?
我真的需要手动处理并行计算吗? This sounds generic enough so that it should be built-in in some way, not only by using more than one core, but maybe some vector processor instructions?
这听起来很通用,所以它应该以某种方式内置,不仅是通过使用多个内核,还可能是一些矢量处理器指令?
Please enlighten me.请赐教。
As long as those arrays match, you should be able to just subset one with another:只要那些 arrays 匹配,你就应该能够将一个与另一个子集化:
set.seed(1)
(values <- array(1:25, c(5,5)))
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 6 11 16 21
#> [2,] 2 7 12 17 22
#> [3,] 3 8 13 18 23
#> [4,] 4 9 14 19 24
#> [5,] 5 10 15 20 25
(variances <- array(rnorm(25,.8,0.2),c(5,5)))
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.6747092 0.6359063 1.1023562 0.7910133 0.9837955
#> [2,] 0.8367287 0.8974858 0.8779686 0.7967619 0.9564273
#> [3,] 0.6328743 0.9476649 0.6757519 0.9887672 0.8149130
#> [4,] 1.1190562 0.9151563 0.3570600 0.9642442 0.4021297
#> [5,] 0.8659016 0.7389223 1.0249862 0.9187803 0.9239651
is.na(values[variances > .8]) <- TRUE
values
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 6 NA 16 NA
#> [2,] NA NA NA 17 NA
#> [3,] 3 NA 13 NA NA
#> [4,] NA NA 14 NA 24
#> [5,] NA 10 NA NA NA
For array length of 10 million it takes about a second on my laptop, data generation included:对于 1000 万的数组长度,在我的笔记本电脑上大约需要一秒钟,包括数据生成:
system.time({
values <- array(1:10e6, c(1000,10000))
variances <- array(rnorm(10e6,.8,0.2),dim(values))
is.na(values[variances > .8]) <- TRUE
})
#> user system elapsed
#> 1.05 0.10 1.14
dim(variances)
#> [1] 1000 10000
object.size(variances)
#> 80000216 bytes
object.size(values)
#> 40000216 bytes
Created on 2023-01-18 with reprex v2.0.2创建于 2023-01-18,使用reprex v2.0.2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.