简体   繁体   English

doRNG 和顺序随机数因同一种子而不同 - R(foreach、doParallel、doRNG)

[英]doRNG and sequential random numbers differ with same seed - R (foreach, doParallel, doRNG)

I'm doing a simulation that is repeating some independent calculation for number of samples.我正在做一个模拟,它对样本数量重复一些独立的计算。 I want to make this parallel to speed this up.我想让它平行以加快速度。 In each sample I'm generating some random numbers (using rnorm).在每个示例中,我生成一些随机数(使用 rnorm)。 I read (and seen) that doParallel numbers are not repeatable so I wanted to use doRNG (which in fact generates the same random numbers independently on number of cores. However what I was surprised with doRNG generates different numbers than sequential for-loop, even when I don't register parallel backend so calculation is done sequentially (when using %dorng% operator I get the same results as with parallel backend registered), however I get the same numbers with %dopar% with no parallel backend registered. Why is that? Can I somehow parametrize foreach/doRNG to get the same random numbers as in sequential for-loop? I wanted to use this as a check that I didn't mess anything up while moving to parallel.我读到(并看到)doParallel 数是不可重复的,所以我想使用 doRNG(实际上它会根据核心数量独立生成相同的随机数。但是令我惊讶的是,doRNG 生成的数字与顺序 for 循环不同,甚至当我没有注册并行后端时,计算是按顺序完成的(当使用 %dorng% 运算符时,我得到与注册并行后端相同的结果),但是我使用 %dopar% 得到相同的数字,但没有注册并行后端。为什么是那?我能以某种方式参数化 foreach/doRNG 以获得与顺序 for 循环中相同的随机数吗?我想用它来检查我在转向并行时没有搞砸任何事情。

Below is a simplified example (notice that I do not register parallel backend):下面是一个简化的例子(注意我没有注册并行后端):

library(foreach)
library(doRNG)
library(doParallel)
RNGkind("L'Ecuyer-CMRG")

set.seed(123)
rn3 <- foreach(i=1:20, .combine = 'c') %dopar%{ 
  return(rnorm(1,0,1))
}


rn1 <- foreach(i=1:20, .combine = 'c', .options.RNG=123) %dorng%{ 
  return(rnorm(1,0,1))
}

set.seed(123)
rn2 <- foreach(i=1:20, .combine = 'c') %dorng%{ 
  return(rnorm(1,0,1))
}


rn4 <- rep(0,20)
set.seed(123)
for(i in 1:20){
  rn4[i] <- (rnorm(1,0,1))
}

identical(rn1, rn2) 
identical(rn1, rn3)
identical(rn1, rn4)
identical(rn3, rn4)

It shows that rn1 and rn2 (two different methods of setting seed in dorng) are the same as well as rn3 and rn4 (doParallel and for loop), however rn1/rn2 and rn3/rn4 does not match with each other.它表明 rn1 和 rn2(在 dorng 中设置种子的两种不同方法)以及 rn3 和 rn4(doParallel 和 for 循环)相同,但是 rn1/rn2 和 rn3/rn4 彼此不匹配。

EDIT: I realized that there are different pseudo-random number generators employed.编辑:我意识到使用了不同的伪随机数生成器。 In %dorng% we use L'Ecuyer-CMRG while in base R the default is Mersenne-Twister.在 %dorng% 中,我们使用 L'Ecuyer-CMRG,而在 base R 中,默认是 Mersenne-Twister。 However when I set up it do L'Ecuyer-CMRG as well only first number matches.但是,当我设置它时,L'Ecuyer-CMRG 也只匹配第一个数字。 I've adjusted code to add setting up a differnt PRNG我调整了代码以添加设置不同的 PRNG

Ok, I finally found the reason (the comment I made a moment ago help).好的,我终于找到了原因(我刚才发表的评论有帮助)。 What is %dorng% doing it is generating random seed for each value i in foreach. %dorng% 所做的是为 foreach 中的每个值i生成随机种子。 Due to that to get the same numbers as in %dorng% using for loop we need to first use L'Ecuyer-CMRG PRNG and we need to set the same number of seeds.由于要使用 for 循环获得与 %dorng% 中相同的数字,我们需要首先使用 L'Ecuyer-CMRG PRNG 并且我们需要设置相同数量的种子。 In that case code that would replicate that in for-loop is:在那种情况下,将在 for 循环中复制它的代码是:

RNGkind("L'Ecuyer-CMRG")

rn6 <- rep(0,20)
for(i in 1:20){
  .Random.seed <- attr(rn1,"rng")[[i]] #using seeds from rn1 from question
  rn6[i] <- (rnorm(1,0,1))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM