简体   繁体   English

如何避免 R 中不必要的循环

[英]How to avoid unnecessary for loops in R

I have two vectors drawn from a t-distribution, ie X and epsilon and I generate Y from these to vectors based on a condition.我有两个从 t 分布中提取的向量,即 X 和 epsilon,我根据条件从这些向量中生成 Y。 I aim to simulate multiple samples.我的目标是模拟多个样本。 If I simulate 10.000 samples, it will take a long time for the computer to complete.如果我模拟 10.000 个样本,计算机将需要很长时间才能完成。 I want to reduce the computation time by avoiding the for loop.我想通过避免 for 循环来减少计算时间。 I've tried a few things but it didn't work.我尝试了几件事,但没有奏效。 How to avoid the for loop and reduce the computation time for this specific for loop?如何避免 for 循环并减少此特定 for 循环的计算时间? The code is as follows代码如下

X <- rt(1250,5)
eps <- rt(1250,5)
Y <- replicate(1250,0)

for(i in 1:1250) {
  if(X[i]>quantile(X, 0.5)){
    Y[i] = X[i] + eps[i]
  }
  else { 
    Y[i] = 1.5*X[i] + eps[i]
  }
}

You can use the C-level for loops that exist for many R functions.对于许多 R 函数,您可以使用 C 级 for 循环。 This is called 'vectorisation' and it is a powerful concept in R.这被称为“矢量化”,它是 R 中的一个强大概念。 Function ifelse is vectorised as are the + and * functions. Function ifelse+*函数一样被矢量化。 Hadley Wickham explains it here . Hadley Wickham在这里进行了解释。

X <- rt(1250,5)
eps <- rt(1250,5)

Y <- numeric(1250)
for(i in 1:1250) {
  if(X[i]>quantile(X, 0.5)){
    Y[i] = X[i] + eps[i]
  }
  else { 
    Y[i] = 1.5*X[i] + eps[i]
  }
}

Y_vectorized <- ifelse(X > quantile(X, 0.5), X + eps, 1.5*X + eps) 

With the result:结果:

> identical (Y,Y_vectorized)
[1] TRUE

How much faster is the vectorised approach (using r2evans suggestion to keep the quantile calculation out of the loop)?矢量化方法的速度有多快(使用 r2evans 建议将分位数计算排除在循环之外)?

library(microbenchmark)
Y <- numeric(1250)
med <- quantile(X, 0.5)
microbenchmark("for-loop" = {
  for (i in 1:1250) {
    if (X[i] > quantile(X, 0.5)) {
      Y[i] = X[i] + eps[i]
    }
    else {
      Y[i] = 1.5 * X[i] + eps[i]
    }
  }
}, 
"vectorised" = { Y_vectorized <- ifelse(X > med, X + eps, 1.5 * X + eps) },
times = 100)

Unit: microseconds
       expr      min       lq       mean    median        uq      max neval
   for-loop 120488.2 123000.6 131055.758 125508.95 131246.40 247101.6   100
 vectorised     30.2     36.1     48.955     51.15     53.75    139.6   100

For the vector length of 1250 the vectorised approach is ~2670 times faster.对于 1250 的向量长度,向量化方法的速度要快约 2670 倍。

  1. Don't recalculate quantile(X,0.5) each and every time: it never changes, calculate it once and reuse the stored value.不要每次都重新计算quantile(X,0.5) :它永远不会改变,计算一次并重用存储的值。

  2. Use vectorized operations, knowing that comparisons and assignments can happen a whole vector at a time.使用向量化操作,知道比较和赋值可以一次发生整个向量。 I suggest that you can reduce that to:我建议您可以将其减少到:

     X <- rt(1250,5) eps <- rt(1250,5) med <- quantile(X, 0.5) Y <- ifelse(X > med, 1, 1.5) * X + eps

A quick walk-through of vectorized operations demonstrated here:此处演示了矢量化操作的快速演练:

set.seed(42)
X <- rt(10, 5)
eps <- rt(10, 5)
med <- quantile(X, 0.5)
X
#  [1]  1.9151  0.0878 -0.0773 -0.0618 -0.0480  5.0230  1.0924  0.8423  1.5165
# [10] -0.2601
eps
#  [1]  0.712 -1.048  2.233 -0.737 -1.273 -0.890  0.395 -1.828 -0.601 -0.392
med
#   50% 
# 0.465 

If we compare a vector with a scalar (or a vector with the same-length vector), then we get a vector of logical/boolean:如果我们将向量与标量(或具有相同长度向量的向量)进行比较,则我们得到一个逻辑/布尔向量:

X > med
#  [1]  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE

From this, ifelse is a vectorized conditional.由此, ifelse是一个向量化的条件。 While if (...) {...} else {...} can deal with a single length-1 logical at a time, ifelse works on a vector at a time.虽然if (...) {...} else {...}一次可以处理一个长度为 1 的逻辑, ifelse一次只能处理一个向量。 For instance:例如:

ifelse(c(T, F, F, T), 1:4, 11:14)
# [1]  1 12 13  4

Back to the example, we continue with ifelse and add math operations, which also work just as well on a vector as a scalar.回到示例,我们继续使用ifelse并添加数学运算,这在向量和标量上同样有效。

ifelse(X > med, 1, 1.5)
#  [1] 1.0 1.5 1.5 1.5 1.5 1.0 1.0 1.0 1.0 1.5
ifelse(X > med, 1, 1.5) * X
#  [1]  1.9151  0.1318 -0.1160 -0.0926 -0.0720  5.0230  1.0924  0.8423  1.5165
# [10] -0.3902

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM