简体   繁体   English

了解 R 中的 set.seed()

[英]Understanding set.seed() in R

I am using R and I have some issues with replicating the output of the following:我正在使用 R 并且在复制以下 output 时遇到了一些问题:

mod1 <- glm(TVAR ~ .,data = df, family = "binomial")
y <- predict.glm(mod1)

due to its dependency on set.seed() .由于它依赖于set.seed()

I have some questions related to this?我有一些与此相关的问题?

  • I am aware of the fact that if I preliminarily use set.seed(123) (or whatever other seed), the random generations will always start at the same level hence I will achieve a replicable result.我知道如果我初步使用set.seed(123) (或任何其他种子),随机生成将始终从同一级别开始,因此我将获得可复制的结果。 Nevertheless, let's say that I want to reverse engineer the seed by starting from a good result and then retrieving the seed to replicate that good results the next time.尽管如此,假设我想从一个好的结果开始对种子进行逆向工程,然后检索种子以在下次复制该好的结果。 In other words, let's assume that I run the same code n times without preliminarily setting a seed with the intention of finding the result that best fits me and then to retrieve the seed that was used.换句话说,假设我在没有预先设置种子的情况下运行相同的代码n次,目的是找到最适合我的结果,然后检索使用的种子。 Would that be possible?那可能吗? It may sound like sort of cheating but it is indeed not, as I am just trying to pin down the results of the code on its seed-dependency, under the assumption that the overall idea behind the code is sensed and only needs to achieve a replicable status.这听起来像是一种作弊,但实际上并非如此,因为我只是试图将代码的结果归结为种子依赖性,假设代码背后的整体思想已被感知并且只需要实现一个可复制的状态。

  • Just for my understanding: a new seed is only used when I delete all the variables in the environment?仅出于我的理解:仅当我删除环境中的所有变量时才使用新种子? In fact, if I run the same code more than once but without cleaning the environment, the results are the same, hence the same seed was used.事实上,如果我多次运行相同的代码但不清理环境,结果是相同的,因此使用了相同的种子。 I would appreciate some clarity on this.我希望能对此有所澄清。

  • Lastly: is there a way to understand when a function is dependent on set.seed() ?最后:有没有办法理解 function 何时依赖于set.seed() For instance, on the CRAN manual, I could not find any indication of this which seems to be a crucial issue.例如,在 CRAN 手册上,我找不到任何迹象表明这似乎是一个关键问题。

@KonradRudolph gives a good answer here. @KonradRudolph 在这里给出了一个很好的答案。 I'd just like to add one point to it:我只想补充一点:

There are three ways to set the random seed, and they are not the same:设置随机种子的方式有3种,而且不一样:

  • Using set.seed(n) sets it to an easily reproducible state.使用set.seed(n)将其设置为易于重现的 state。
  • Calling any of the internal random number generators also changes it, in a deterministic but less predictable way.调用任何内部随机数生成器也会改变它,以一种确定性但不太可预测的方式。
  • Saving it and restoring it later sets it to the earlier state.保存它并稍后恢复它会将其设置为较早的 state。

In general, set.seed() can only output a tiny fraction of the possible values of the random seed, whereas calling the RNG should (eventually) cycle through all of them.一般来说, set.seed()只能 output 随机种子可能值的一小部分,而调用 RNG 应该(最终)循环遍历所有这些值。 There are about 2^20000 different random seeds possible, but set.seed() can only create about 2^32 of them.可能有大约2^20000个不同的随机种子,但set.seed()只能创建大约2^32 (Both of these numbers are over-estimates, but the ratio is about right.) (这两个数字都被高估了,但比例差不多。)

You can save and restore the .Random.seed variable, or call set.seed(n) to set the random seed to a known state.您可以保存和恢复.Random.seed变量,或调用set.seed(n)将随机种子设置为已知的 state。 The only feasible way to reproduce a particular state is to start in a known state and repeat the calls that led to the one you want.重现特定 state 的唯一可行方法是从已知的 state 开始并重复导致您想要的调用的调用。

To answer your points in turn:依次回答您的观点:

Would [reverse engineering the seed from the result of a computation] be possible? [从计算结果对种子进行逆向工程] 可能吗?

It depends on the actual random-number generator being used, but in general this is hard, because the state space of a good RNG is huge and you might have to search it exhaustively.这取决于所使用的实际随机数生成器,但总的来说这很难,因为好的 RNG 的 state 空间很大,您可能必须彻底搜索它。 Potentially this wouldn't just take hours but years .可能这不仅需要数小时,而且需要数

a new seed is only used when I delete all the variables in the environment?仅当我删除环境中的所有变量时才使用新种子?

A new seed is used whenever you invoke set.seed .每当您调用set.seed时,都会使用一个新种子。 The actual current seed value is stored in the hidden variable .Random.seed in the global environment.实际的当前种子值存储在全局环境中的隐藏变量.Random.seed中。 However, removing the seed won't make your last computation reproducible, since R re-initialises the value of that seed based on a non-deterministic value (in actual fact, the current operating system time).但是,删除种子不会使您的最后一次计算可重现,因为 R 会根据非确定性值(实际上是当前操作系统时间)重新初始化该种子的值。

if I run the same code more than once but without cleaning the environment, the results are the same, hence the same seed was used.如果我多次运行相同的代码但没有清理环境,结果是相同的,因此使用了相同的种子。

No: consuming random values (by calling a stochastic function) changes the random seed.否:使用随机值(通过调用随机函数)会改变随机种子。 So running multiple computations in a row without cleaning the environment does not produce the same result.因此,在不清理环境的情况下连续运行多个计算不会产生相同的结果。 In fact, that would be terrible .事实上,那将是可怕的。 You can see this easily yourself:您可以自己轻松地看到这一点:

〉rnorm(1)
[1] -0.3156453
〉rnorm(1)
[1] 0.7345465

… clearly, two consecutive calls of a stochastic function (here, rnorm ) did not produce the same result, even though I didn't clean the environment in between calls. ......显然,随机 function (此处为rnorm )的两次连续调用并没有产生相同的结果,即使我没有在两次调用之间清理环境。

is there a way to understand when a function is dependent on set.seed() ?有没有办法了解 function 何时依赖于set.seed()

You could set different random seeds, rerun the function, and see if the output changes.您可以设置不同的随机种子,重新运行 function,然后查看 output 是否发生变化。

Apart from that there is no general, straightforward way to do this.除此之外,没有通用的、直接的方法可以做到这一点。 If the function does not document its dependency on set.seed , then your only recourse is to look carefully at the source code of the function (and all functions it calls in turn).如果 function 没有记录它对set.seed的依赖,那么你唯一的办法就是仔细查看 function 的源代码(以及它依次调用的所有函数)。


Bonus (as noted by Roland in the comments):奖励(如 Roland 在评论中指出的那样):

glm and predict.glm are not stochastic functions, and do not depend no set.seed . glmpredict.glm不是随机函数,也不依赖set.seed

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM