[英]Why am I getting different bootstrap results using different algorithms?
I am using two different methods of trying to generate a bootstrap sample我正在使用两种不同的方法来尝试生成引导样本
np.random.seed(335)
y=np.random.normal(0,1,5)
b=np.empty(len(y)) #initializes an empty vector
for j in range(len(y)):
a = np.random.randint(1,len(y)) #Draws a random integer from 1 to n, where n is our sample size
b[j] = y[a-1] #indicies in python start at zero, the worst part of Python in my opinion
c = np.random.choice(y, size=5)
print(b)
print(c)
and for my output I get different results对于我的 output 我得到不同的结果
[1.04749432 1.71963433 1.71963433 1.71963433 1.71963433]
[-0.25224454 -0.25224454 0.46604474 1.71963433 0.46604474]
I think the answer has something to do with the random number generator, but I'm confused as to the exact reason.我认为答案与随机数生成器有关,但我对确切原因感到困惑。
This comes down to the use of different algorithms for randomized selection.这归结为使用不同的算法进行随机选择。 There are numerous equivalent ways to select items at random with replacement using a pseudorandom generator (or to generate random variates from any other distribution).
select 项目有许多等效方法,随机替换使用伪随机生成器(或从任何其他分布生成随机变量)。 In particular, the algorithm for
numpy.random.choice
need not make use of numpy.random.randint
in theory.特别是,
numpy.random.choice
的算法理论上不需要使用numpy.random.randint
。 What matters is that these equivalent ways should produce the same distribution of random variates.重要的是这些等效方法应该产生相同的随机变量分布。 In the case of NumPy, look at NumPy's source code .
NumPy的情况,看NumPy的源码。
Another, less important, reason for different results is that the two different selection procedures ( randint
and choice
) produce pseudorandom numbers themselves, which can differ from each other because the selection procedures didn't begin with the same seed (more precisely, the same sequence of pseudorandom numbers).另一个不太重要的不同结果的原因是两个不同的选择程序(
randint
和choice
)本身会产生伪随机数,它们可能彼此不同,因为选择程序不是从相同的种子开始的(更准确地说,相同的伪随机数序列)。 If we set the seed to the same value before beginning each procedure:如果我们在开始每个过程之前将种子设置为相同的值:
np.random.seed(335)
y=np.random.normal(0,1,5)
b=np.empty(len(y))
np.random.seed(999999) # Seed selection procedure 1
for j in range(len(y)):
a = np.random.randint(1,len(y))
b[j] = y[a-1]
np.random.seed(999999) # Seed selection procedure 2
c = np.random.choice(y, size=5)
print(b)
print(c)
then each procedure will begin with the same pseudorandom numbers.然后每个过程将以相同的伪随机数开始。 But even so, the two procedures may use different algorithms for random selection, and these differences may still lead to different results.
但即便如此,这两个程序可能使用不同的算法进行随机选择,这些差异仍然可能导致不同的结果。
(However, numpy.random.*
functions, such as randint
and choice
, have become legacy functions as of NumPy 1.17, and their algorithms are expected to remain as they are for backward compatibility reasons. That version didn't deprecate any numpy.random.*
functions, however, so they are still available for the time being. See also this question . In newer applications you should make use of the new system introduced in version 1.17, including numpy.random.Generator
, if you have that version or later. One advantage of the new system is that the application relies less on global state.) (However,
numpy.random.*
functions, such as randint
and choice
, have become legacy functions as of NumPy 1.17, and their algorithms are expected to remain as they are for backward compatibility reasons. That version didn't deprecate any numpy.random.*
功能,但是,因此它们暂时仍然可用。另请参阅此问题。在较新的应用程序中,您应该使用版本 1.17 中引入的新系统,包括numpy.random.Generator
,如果您有该版本或新系统的一个优点是应用程序对全局 state 的依赖较少。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.