[英]Efficient double for-loop with max operation
What is an efficient implementation of the following double for-loop in R? R中以下双重for循环的有效实现是什么?
set.seed(1)
u <- rnorm(100, 1)
v <- rnorm(100, 2)
x <- rnorm(100, 3)
y <- rnorm(100, 4)
sum = 0
for (i in 1:100){
for (j in 1:100) {
sum = sum + (1 - max(u[i], v[j])) * (1 - max(x[i], y[j]))
}
}
Especially for really long vectors the evaluation takes quite a while, but I wonder if there is a way to vectorize this double for-loop? 特别是对于非常长的向量,评估需要相当长的时间,但我想知道是否有一种方法可以对这个双循环进行矢量化? Thank you very much.
非常感谢你。
Similar to the one given by @www (but in base R) 与@www给出的类似(但在基础R中)
uv <- expand.grid(u, v)
xy <- expand.grid(x, y)
sum((1 - do.call(pmax, uv))*(1 - do.call(pmax, xy)))
# [1] 37270.31
Benchmark 基准
library(microbenchmark)
microbenchmark(
original = {
SUM <- 0
for (i in 1:100){
for (j in 1:100) {
SUM <- SUM + (1 - max(u[i], v[j])) * (1 - max(x[i], y[i]))
}
}
}
, tidyverse = {
dat <- data_frame(u, v, x, y)
dat2 <- dat %>% complete(nesting(u, x), nesting(v, y))
sum(with(dat2, (1 - pmax(u, v)) * (1 - pmax(x, y))))
}
, expand = {
uv <- expand.grid(u, v)
xy <- expand.grid(x, y)
sum((1 - do.call(pmax, uv))*(1 - do.call(pmax, xy)))
}
, outer = sum((1 - outer(u, v, pmax))*(1 - outer(x, y, pmax)))
)
# Unit: microseconds
# expr min lq mean median uq max neval
# original 12512.838 14315.3480 18210.6801 15189.9525 17504.480 217572.149 100
# tidyverse 4373.285 4924.0305 5812.2483 5603.1585 6044.828 14461.375 100
# expand 843.972 961.2120 1163.5428 1061.9080 1219.674 2865.911 100
# outer 228.823 252.7905 301.5965 285.5315 322.832 686.055 100
Mine is faster. 我的速度更快。 It uses
outer
instead of the loops, that is what it's meant for. 它使用
outer
而不是循环,这就是它的意思。
First the functions that do not need external packages, the OP's, the one in user20650's comment and mine. 首先是不需要外部软件包的功能,OP,用户20650评论和我的。
original <- function(u, v, x, y){
sum1 = 0
for (i in seq_along(u)){
for (j in seq_along(v)) {
sum1 = sum1 + (1 - max(u[i], v[j])) * (1 - max(x[i], y[j]))
}
}
sum1
}
comment <- function(u, v, x, y){
sum1 = 0
for (i in seq_along(u)){
sum1 = sum1 + (1 - pmax(u[i], v)) * (1 - pmax(x[i], y))
}
sum(sum1)
}
rui <- function(u, v, x, y){
tmp1 <- outer(u, v, pmax)
tmp2 <- outer(x, y, pmax)
sum((1 - tmp1) * (1 - tmp2))
}
Now the functions in www's answer and in IceCreamToucan's answer . 现在在www的答案和IceCreamToucan的答案中的功能 。
library(tidyverse)
www <- function(u, v, x, y){
dat <- data_frame(u, v, x, y)
dat2 <- dat %>% complete(nesting(u, x), nesting(v, y))
SUM2 <- sum(with(dat2, (1 - pmax(u, v)) * (1 - pmax(x, y))))
SUM2
}
IceCream <- function(u, v, x, y){
uv <- expand.grid(u, v)
xy <- expand.grid(x, y)
sum((1 - do.call(pmax, uv))*(1 - do.call(pmax, xy)))
}
Test them all to see if the results are the same. 测试它们以查看结果是否相同。 Note that there are floating-point issues.
请注意,存在浮点问题。
set.seed(1234)
u <- rnorm(1e2, 1)
v <- rnorm(1e2, 2)
x <- rnorm(1e2, 3)
y <- rnorm(1e2, 4)
o <- original(u, v, x, y)
c <- comment(u, v, x, y)
w <- www(u, v, x, y)
i <- IceCream(u, v, x, y)
r <- rui(u, v, x, y)
all.equal(o, c)
all.equal(o, w)
all.equal(o, i)
all.equal(o, r)
o - c
o - w
o - r
w - r
i - r
c - r
Now the speed test. 现在进行速度测试。
library(microbenchmark)
library(ggplot2)
mb <- microbenchmark(
loop = original(u, v, x, y),
pmax = comment(u, v, x, y),
tidy = www(u, v, x, y),
ice = IceCream(u, v, x, y),
outer = rui(u, v, x, y)
)
autoplot(mb)
Here is the output from your code. 以下是代码的输出。
set.seed(1)
u <- rnorm(100, 1)
v <- rnorm(100, 2)
x <- rnorm(100, 3)
y <- rnorm(100, 4)
SUM <- 0
for (i in 1:100){
for (j in 1:100) {
SUM <- SUM + (1 - max(u[i], v[j])) * (1 - max(x[i], y[j]))
}
}
SUM
# [1] 37270.31
The same output can be generated by using the tidyverse
and pmap
. 使用
tidyverse
和pmap
可以生成相同的输出。 We first need to create the right combination for each vector. 我们首先需要为每个向量创建正确的组合。 We can then use
pmap
to calculate the result. 然后我们可以使用
pmap
来计算结果。
library(tidyverse)
dat <- data_frame(u, v, x, y)
dat2 <- dat %>% complete(nesting(u, x), nesting(v, y))
SUM2 <- sum(with(dat2, (1 - pmax(u, v)) * (1 - pmax(x, y))))
SUM2
# [1] 37270.31
The tidyversse
and pmap
method is faster than the for-loop
. tidyversse
和pmap
方法比for-loop
更快。
library(microbenchmark)
microbenchmark(
m1 = {SUM <- 0
for (i in 1:100){
for (j in 1:100) {
SUM <- SUM + (1 - max(u[i], v[j])) * (1 - max(x[i], y[i]))
}
}},
m2 = {
dat <- data_frame(u, v, x, y)
dat2 <- dat %>% complete(nesting(u, x), nesting(v, y))
SUM2 <- sum(with(dat2, (1 - pmax(u, v)) * (1 - pmax(x, y))))
SUM2
})
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# m1 13.983890 15.045932 17.579693 16.554175 18.267269 39.15417 100 b
# m2 5.716827 6.226258 7.029025 6.735946 7.186002 14.09338 100 a
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.