简体   繁体   English

高效双循环,最大操作

[英]Efficient double for-loop with max operation

What is an efficient implementation of the following double for-loop in R? R中以下双重for循环的有效实现是什么?

set.seed(1)
u <- rnorm(100, 1)
v <- rnorm(100, 2)
x <- rnorm(100, 3)
y <- rnorm(100, 4)
sum = 0
for (i in 1:100){
  for (j in 1:100) {
    sum = sum + (1 - max(u[i], v[j])) * (1 - max(x[i], y[j]))
  }
}

Especially for really long vectors the evaluation takes quite a while, but I wonder if there is a way to vectorize this double for-loop? 特别是对于非常长的向量,评估需要相当长的时间,但我想知道是否有一种方法可以对这个双循环进行矢量化? Thank you very much. 非常感谢你。

Similar to the one given by @www (but in base R) 与@www给出的类似(但在基础R中)

uv <- expand.grid(u, v)
xy <- expand.grid(x, y)

sum((1 - do.call(pmax, uv))*(1 - do.call(pmax, xy)))

# [1] 37270.31

Benchmark 基准

library(microbenchmark)

microbenchmark(
  original = {
    SUM <- 0
    for (i in 1:100){
      for (j in 1:100) {
        SUM <- SUM + (1 - max(u[i], v[j])) * (1 - max(x[i], y[i]))
      }
    }
  }
  , tidyverse = {
      dat <- data_frame(u, v, x, y)
      dat2 <- dat %>% complete(nesting(u, x), nesting(v, y))

      sum(with(dat2, (1 - pmax(u, v)) * (1 - pmax(x, y))))
    }
  , expand = {
      uv <- expand.grid(u, v)
      xy <- expand.grid(x, y)

      sum((1 - do.call(pmax, uv))*(1 - do.call(pmax, xy)))
    }
  , outer = sum((1 - outer(u, v, pmax))*(1 - outer(x, y, pmax)))
)

# Unit: microseconds
#       expr       min         lq       mean     median        uq        max neval
#   original 12512.838 14315.3480 18210.6801 15189.9525 17504.480 217572.149   100
#  tidyverse  4373.285  4924.0305  5812.2483  5603.1585  6044.828  14461.375   100
#     expand   843.972   961.2120  1163.5428  1061.9080  1219.674   2865.911   100
#      outer   228.823   252.7905   301.5965   285.5315   322.832    686.055   100

Mine is faster. 我的速度更快。 It uses outer instead of the loops, that is what it's meant for. 它使用outer而不是循环,这就是它的意思。

First the functions that do not need external packages, the OP's, the one in user20650's comment and mine. 首先是不需要外部软件包的功能,OP,用户20650评论和我的。

original <- function(u, v, x, y){
  sum1 = 0
  for (i in seq_along(u)){
    for (j in seq_along(v)) {
      sum1 = sum1 + (1 - max(u[i], v[j])) * (1 - max(x[i], y[j]))
    }
  }
  sum1
}

comment <- function(u, v, x, y){
  sum1 = 0
  for (i in seq_along(u)){
    sum1 = sum1 + (1 - pmax(u[i], v)) * (1 - pmax(x[i], y))
  }
  sum(sum1)
}

rui <- function(u, v, x, y){
  tmp1 <- outer(u, v, pmax)
  tmp2 <- outer(x, y, pmax)
  sum((1 - tmp1) * (1 - tmp2))
}

Now the functions in www's answer and in IceCreamToucan's answer . 现在在www的答案IceCreamToucan的答案中的功能

library(tidyverse)

www <- function(u, v, x, y){
  dat <- data_frame(u, v, x, y)
  dat2 <- dat %>% complete(nesting(u, x), nesting(v, y))
  SUM2 <- sum(with(dat2, (1 - pmax(u, v)) * (1 - pmax(x, y))))
  SUM2
}

IceCream <- function(u, v, x, y){
  uv <- expand.grid(u, v)
  xy <- expand.grid(x, y)
  sum((1 - do.call(pmax, uv))*(1 - do.call(pmax, xy)))
}

Test them all to see if the results are the same. 测试它们以查看结果是否相同。 Note that there are floating-point issues. 请注意,存在浮点问题。

set.seed(1234)

u <- rnorm(1e2, 1)
v <- rnorm(1e2, 2)
x <- rnorm(1e2, 3)
y <- rnorm(1e2, 4)

o <- original(u, v, x, y)
c <- comment(u, v, x, y)
w <- www(u, v, x, y)
i <- IceCream(u, v, x, y)
r <- rui(u, v, x, y)

all.equal(o, c)
all.equal(o, w)
all.equal(o, i)
all.equal(o, r)

o - c
o - w
o - r
w - r
i - r
c - r

Now the speed test. 现在进行速度测试。

library(microbenchmark)
library(ggplot2)

mb <- microbenchmark(
  loop = original(u, v, x, y),
  pmax = comment(u, v, x, y),
  tidy = www(u, v, x, y),
  ice = IceCream(u, v, x, y),
  outer = rui(u, v, x, y)
)

autoplot(mb)

在此输入图像描述

Here is the output from your code. 以下是代码的输出。

set.seed(1)

u <- rnorm(100, 1)
v <- rnorm(100, 2)
x <- rnorm(100, 3)
y <- rnorm(100, 4)
SUM <- 0
for (i in 1:100){
  for (j in 1:100) {
    SUM <- SUM + (1 - max(u[i], v[j])) * (1 - max(x[i], y[j]))
  }
}
SUM
# [1] 37270.31

The same output can be generated by using the tidyverse and pmap . 使用tidyversepmap可以生成相同的输出。 We first need to create the right combination for each vector. 我们首先需要为每个向量创建正确的组合。 We can then use pmap to calculate the result. 然后我们可以使用pmap来计算结果。

library(tidyverse)

dat <- data_frame(u, v, x, y)
dat2 <- dat %>% complete(nesting(u, x), nesting(v, y))

SUM2 <- sum(with(dat2, (1 - pmax(u, v)) * (1 - pmax(x, y))))
SUM2
# [1] 37270.31

The tidyversse and pmap method is faster than the for-loop . tidyverssepmap方法比for-loop更快。

library(microbenchmark)

microbenchmark(
  m1 = {SUM <- 0
for (i in 1:100){
  for (j in 1:100) {
    SUM <- SUM + (1 - max(u[i], v[j])) * (1 - max(x[i], y[i]))
  }
}},
  m2 = {
    dat <- data_frame(u, v, x, y)
    dat2 <- dat %>% complete(nesting(u, x), nesting(v, y))

    SUM2 <- sum(with(dat2, (1 - pmax(u, v)) * (1 - pmax(x, y))))
    SUM2
  })
# Unit: milliseconds
#  expr       min        lq      mean    median        uq      max neval cld
#    m1 13.983890 15.045932 17.579693 16.554175 18.267269 39.15417   100   b
#    m2  5.716827  6.226258  7.029025  6.735946  7.186002 14.09338   100  a 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM