简体   繁体   English

我会错过使用numpy随机数生成器进行引导吗?

[英]Am i miss-using numpy random number generator for bootstrapping?

I attempted to write some code to create a bootstrap distribution and, although it compiles, I'm not sure it is working correctly. 我试图编写一些代码来创建引导分发,尽管它可以编译,但我不确定它是否可以正常工作。 Some background: A student at the school where I teach has been systematically finding the combination to the locks on the laptops in our computer lab to screw with our computer teacher (who is, fortunately, not me). 一些背景:我所教学校的一名学生一直在系统地找到计算机实验室中笔记本电脑锁的组合,以便与我们的计算机老师(很幸运,不是我)搞砸。 Each lock has three entries with the numbers 0-9. 每个锁具有三个条目,编号为0-9。 I calculate that there are 10^3 possible combinations per lock. 我计算出每个锁有10 ^ 3种可能的组合。 He kept detailed lists of combinations he has already tried for each lock so each successive attempt samples one combination without replacement. 他保留了已为每种锁尝试过的组合的详细列表,因此,每次连续尝试都会对一个组合进行采样,而无需替换。 I am trying to simulate this to get an idea of how many attempts he made to unlock all of these computers (there are 12 computers in lab) by finding an expected value for the number of times it would take to unlock one. 我正在尝试对此进行模拟,以了解他为解锁所有这些计算机(实验室中有12台计算机)进行了多少次尝试,方法是找到一次解锁所需的期望值。 This sounds like a hypergeometric distribution to me. 对我来说,这听起来像是超几何分布。 The code I wrote is: 我写的代码是:

import numpy as np

def lock_hg(N):

    final_counts = []
    for i in range(N):
        count = 1
        combs = list(np.arange(1,1001,1))
        guess = np.random.randint(1,1000)
        for k in range(1000):
            a = np.random.choice(combs, 1)
            if a == guess:
                final_counts.append(count)
                break
            else:
                count = count + 1
                combs.remove(a)

    return(final_counts)

The histogram plt.hist(final_counts) when lock_hg(1000) is called looks fairly uniform with 40 or 50 attempts being just as common as 900 or 950. I thought it would look more like a normal distribution centered at 500. I'm not sure if there is a problem with the code or I am just misunderstanding the math. 调用lock_hg(1000)时的直方图plt.hist(final_counts)看起来相当均匀,尝试40或50次与900或950次相同。我认为它看起来更像是以500为中心的正态分布。确定代码是否有问题,或者我只是误解了数学。 Is this code appropriate for the problem? 此代码适合该问题吗? If not, how can I fix it? 如果没有,我该如何解决? If it is working, is there a more efficient way to do this and, if so, what is it? 如果工作正常,是否有更有效的方法来完成此操作?

Imagine generating a grid of combinations, with each row representing a lock and each column value a possible combination for that lock. 想象一下生成一个组合网格,其中每一行代表一个锁,每一列值是该锁的可能组合。 For example, suppose there are 10 locks and only 5 possible combinations per lock. 例如,假设有10个锁,每个锁只有5种可能的组合。 You can generate them all in a random order like this: 您可以按以下任意顺序生成它们:

In [42]: np.random.seed(2018) # to make the example reproducible
In [43]: grid = np.random.random((10,5)).argsort(axis=1); grid
Out[43]: 
array([[1, 3, 4, 0, 2],
       [4, 0, 2, 3, 1],
       [3, 4, 2, 0, 1],
       [2, 1, 3, 4, 0],
       [1, 3, 0, 4, 2],
       [1, 0, 4, 3, 2],
       [2, 0, 1, 3, 4],
       [2, 0, 3, 4, 1],
       [2, 3, 1, 0, 4],
       [2, 4, 0, 3, 1]])

Next, let's pick a random combination for each of the 10 locks: 接下来,让我们为10个锁中的每个锁选择一个随机组合:

In [48]: combo = np.random.choice(5, size=10, replace=True); combo
Out[48]: array([3, 2, 3, 3, 4, 4, 4, 3, 2, 3])

We can think of grid as indicating the order in which combinations are tried for each lock. 我们可以将grid视为指示每个锁尝试组合的顺序。 And we can take combo to be the actual combination for each lock. 我们可以将combo用作每个锁的实际组合。

We can also visualize the location of the matches using: 我们还可以使用以下方式可视化比赛的位置:

plt.imshow((grid == combo[:, None])[::-1], origin='upper')

在此处输入图片说明

and we can find the location of each successful match in our grid by using argmax : 我们可以使用argmax找到每个成功匹配在网格中的argmax

In [73]: (grid == combo[:, None]).argmax(axis=1)
Out[73]: array([1, 2, 0, 2, 3, 2, 4, 2, 0, 3])

argmax returns the index (location) of a match for each row. argmax返回每一行的匹配项的索引(位置)。 These index numbers also indicate the number of attempts required to find each match. 这些索引号还指示找到每个匹配项所需的尝试次数。 Well, almost. 好吧,差不多。 Since Python is 0-index based, argmax will return 0 if the match occurs on the first attempt. 由于Python是基于0索引的, argmax如果首次尝试匹配, argmax将返回0。 So we need to add 1 to (grid == combo[:, None]).argmax(axis=1) to obtain the true number of attempts. 因此,我们需要向(grid == combo[:, None]).argmax(axis=1)(grid == combo[:, None]).argmax(axis=1)以获得真实的尝试次数。

So, we are looking for the distribution of (grid == combo[:, None]).argmax(axis=1) + 1 . 因此,我们正在寻找(grid == combo[:, None]).argmax(axis=1) + 1 Now that we've worked out the computation for 10 locks and 5 combinations, it is easy to increase this to, say, 10000 locks and 1000 combinations: 现在我们已经计算出10个锁和5个组合的计算量,现在很容易将其增加到10000个锁和1000个组合:

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(2018)

num_locks = 10000
num_combos = 1000

grid = np.random.random((num_locks, num_combos)).argsort(axis=1)
combo = np.random.choice(num_combos, size=num_locks, replace=True)
attempts = (grid == combo[:, None]).argmax(axis=1) + 1

plt.hist(attempts, density=True)
plt.show()

在此处输入图片说明

This method of picking a random location in the grid makes it clear that the distribution should be uniform -- it's just as likely that the right combo occurs at the beginning, as at the end, or at any location in between. 这种在网格中随机选择位置的方法清楚地表明了分布应该是均匀的-正确的组合很可能出现在开头,结尾或中间的任何位置。

A uniform distribution is expected, yes. 是的,期望分布均匀。 The code is fine. 代码很好。

A possible optimization would be to swap the chosen key with the last one in the list, before removing it. 一种可能的优化方法是,在删除所选密钥之前,将其与列表中的最后一个交换。 This would avoid touching all the ones in between. 这样可以避免碰到两者之间的所有内容。

Two improvements you can make: 您可以进行两项改进:

  1. Python has a built-in random number generator. Python具有内置的随机数生成器。 https://docs.python.org/2/library/random.html https://docs.python.org/2/library/random.html
import random

for i in range(5):
    print(random.randint(0, 100))

10
38
53
83
23
  1. If you're trying to iterate through all possible combinations to get into something (like a lock), it's better to go up by one instead of using a random number generator. 如果您尝试遍历所有可能的组合以进入某种事物(例如锁),则最好向上移动一个而不是使用随机数生成器。 I could be misunderstanding the question a bit as I'm not sure whether you're trying to figure out how he did it. 我可能会误解这个问题,因为我不确定您是否要弄清楚他是如何做到的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM