简体   繁体   English

我有办法避免for循环或使其更有效率吗?

[英]Is there a way for me to avoid the for loop or make it more efficient?

I want to pick one element from x, one element from y (x and y are mutually exclusive), and one element from x or y that has not already been selected. 我想从x中选择一个元素,从y中选择一个元素(x和y是互斥的),并从x或y中选择一个尚未被选择的元素。 I then want to repeat the process a specified number of times and store the results of each trial in a dataframe. 然后,我想重复该过程指定的次数,并将每个试验的结果存储在一个数据框中。 (note: I am not interested in trying to find every possible combination) (注意:我对尝试找到每种可能的组合都不感兴趣)

The code below works but slows considerably as the number of trials increases. 下面的代码有效,但是随着试用次数的增加,速度会大大降低。

x <- 1:4
y <- 5:8
z <- c(x, y) #edited - previous code read a, b in place of x, y
trials <- 5
sel <- data.frame()
set.seed(123)
for (i in 1:trials){
    x_sel <- sample(x, 1)
    y_sel <- sample(y, 1)
    rem <- z[!(z %in% c(x_sel, y_sel))]
    z_sel <- sample(rem, 1)
    sel <- rbind(sel, cbind(x_sel, y_sel, z_sel))
}

This should probably be somewhat faster, but I doubt it's the fastest possible. 这可能会更快一些,但我怀疑这可能是最快的。 Certainly Rcpp would be the fastest, I would think. 我认为Rcpp当然是最快的。

> set.seed(123)
> x <- 1:4
> y <- 5:8
> z <- c(x, y)
> trials <- 5
> 
> xval <- sample(x,size = trials,replace = TRUE)
> yval <- sample(y,size = trials,replace = TRUE)
> zval <- mapply(FUN = function(x,y,z) {sample(setdiff(z,c(x,y)),1)},
                             x = xval,
                             y = yval,
                             MoreArgs = list(z = z))
> 
> result <- data.frame(xval = xval,
                                         yval = yval,
                                         zval = zval)
> result
  xval yval zval
1    2    5    8
2    4    7    3
3    2    8    6
4    4    7    5
5    4    6    1

At only 10k samples, this appears to be ~37x faster than your for loop (which was primarily inefficient because of the appending things one at a time onto sel , not anything inherent in the for loop). 在只有1万个样本的情况下,这似乎比您的for循环快约37倍(这主要是效率低下,因为一次将一个内容附加到sel ,而不是for循环中固有的任何内容)。 The difference between this and a more sensibly written for loop would likely be much less. 这和更明智地编写for循环之间的差异可能会小得多。

My approach is not elegant but seems to be efficient when a number of trials is large. 我的方法不够优雅,但是在进行大量试验时似乎很有效。 In order to prove it I created 3 function: f1 - yours, f2 - joran's, f3 - mine 为了证明这一点,我创建了3个函数:f1-您的f2-joran的f3-我的

library(microbenchmark)

f1 <- function() {
   x <- 1:4
   y <- 5:8
   z <- c(x, y) #edited - previous code read a, b in place of x, y
   trials <- 5000
   sel <- data.frame()
   set.seed(123)
   for (i in 1:trials) {
      x_sel <- sample(x, 1)
      y_sel <- sample(y, 1)
      rem <- z[!(z %in% c(x_sel, y_sel))]
      z_sel <- sample(rem, 1)
      sel <- rbind(sel, cbind(x_sel, y_sel, z_sel))
   }
   return(sel)
}

f2 <- function() {
   set.seed(123)
   x <- 1:4
   y <- 5:8
   z <- c(x, y)
   trials <- 5000

   xval <- sample(x, size = trials, replace = TRUE)
   yval <- sample(y, size = trials, replace = TRUE)
   zval <-
      mapply(
         FUN = function(x, y, z) {
            sample(setdiff(z, c(x, y)), 1)
         },
         x = xval,
         y = yval,
         MoreArgs = list(z = z)
      )

   result <- data.frame(xval = xval,
                        yval = yval,
                        zval = zval)
   return(result)
}


f3 <- function() {
   x <- 1:4
   y <- 5:8
   z <- c(x, y) #edited - previous code read a, b in place of x, y
   trials <- 5000
   set.seed(123)
   x_sel <- sample(x, trials, replace = TRUE)
   y_sel <- sample(y, trials, replace = TRUE)
   z_mac <- matrix(z,
                   nrow = trials,
                   ncol = length(z),
                   byrow = TRUE)
   take <- z_mac != x_sel & z_mac != y_sel
   z_sel <- t(matrix(t(z_mac)[t(take)], ncol = trials))
   take <- sample(1:ncol(z_sel), size = trials, replace = TRUE)
   cbind(x_sel, y_sel, z_sel = z_sel[cbind(1:trials, take)])
}


microbenchmark(f1(), f2(), f3(), times = 10L)

Unit:milliseconds
expr         min          lq        mean      median          uq         max neval
f1() 2193.448113 2248.442450 2258.626023 2258.135072 2267.333956 2346.457082    10
f2()  205.124501  208.672947  213.520267  212.208095  219.397101  222.990083    10
f3()    2.463567    2.491762    2.570517    2.512588    2.603582    2.827863    10

My f3 function is 856 times faster than f1 and 83 times faster than f2. 我的f3函数比f1快856倍,比f2快83倍。 When we consider oryginal problem (trials=5) then 当我们考虑原始问题(试验= 5)时

> microbenchmark(f1(), f2(), f3(), times = 10L)
Unit: microseconds
 expr      min       lq      mean    median       uq      max neval
 f1() 1215.924 1268.790 1296.7610 1300.5095 1321.015 1370.998    10
 f2()  587.937  590.500  619.6248  612.9285  638.881  687.261    10
 f3()   68.886   78.819   86.7652   81.2225   91.315  116.947    10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM