简体   繁体   English

如何确保生成的数字列表遵循均匀分布

[英]How to make sure that a list of generated numbers follow a uniform distribution

I have a list of 150 numbers from 0 to 149. I would like to use a for loop with 150 iterations in order to generate 150 lists of 6 numbers such that,t in each iteration k , the number k is included as well as 5 different random numbers.我有一个从 0 到 149 的 150 个数字的列表。我想使用具有 150 次迭代的 for 循环来生成 150 个 6 个数字的列表,这样,t 在每次迭代k中,包括数字k以及 5不同的随机数。 For example:例如:

S0 = [0, r1, r2, r3, r4, r5] # r1, r2,..., r5 are random numbers between 0 and 150
S1 = [1, r1', r2', r3', r4', r5'] # r1', r2',..., r5' are new random numbers between 0 and 150
...
S149 = [149, r1'', r2'', r3'', r4'', r5''] 

In addition, the numbers in each list have to be different and with a minimum distance of 5. This is the code I am using:此外,每个列表中的数字必须不同,并且最小距离为 5。这是我正在使用的代码:

import random
import numpy as np

final_list = []
for k in range(150):
    S = [k]
    for it in range(5):
        domain = [ele for ele in range(150) if ele not in S]
        d = 0
        x = k
        while d < 5:
            d = np.Infinity
            x = random.sample(domain, 1)[0]
            for ch in S:
                if np.abs(ch - x) < d:
                    d = np.abs(ch - x)
        S.append(x)
    final_list.append(S)

Output: Output:

[[0, 149, 32, 52, 39, 126],
 [1, 63, 16, 50, 141, 79],
 [2, 62, 21, 42, 35, 71],
...
 [147, 73, 38, 115, 82, 47],
 [148, 5, 78, 115, 140, 43],
 [149, 36, 3, 15, 99, 23]]

Now, the code is working but I would like to know if it's possible to force that number of repetitions that each number has through all the iterations is approximately the same.现在,代码正在运行,但我想知道是否可以强制每个数字在所有迭代中的重复次数大致相同。 For example, after using the previous code, this plot indicates how many times each number has appeared in the generated lists:例如,使用前面的代码后,这个 plot 表示每个数字在生成的列表中出现了多少次:

代表

As you can see, there are numbers that have appeared more than 10 times while there are others that have appeared only 2 times.如您所见,有些数字出现了 10 次以上,而有些数字只出现了 2 次。 Is it possible to reduce this level of variation so that this plot can be approximated as a uniform distribution?是否可以减少这种变化水平,以便可以将这个 plot 近似为均匀分布? Thanks.谢谢。

First, I am not sure that your assertion that the current results are not uniformly distributed is necessarily correct.首先,我不确定您关于当前结果不是均匀分布的断言是否一定正确。 It would seem prudent to me to try and examine the histogram over several repetitions of the process, rather than just one.对我来说,尝试检查多次重复该过程的直方图似乎是谨慎的,而不仅仅是一次。

I am not a statistician, but when I want to approximate uniform distribution (and assuming that the functions in random provide uniform distribution), what I try to do is to simply accept all results returned by random functions.我不是统计学家,但是当我想近似均匀分布(并假设random函数提供均匀分布)时,我尝试做的是简单地接受random函数返回的所有结果。 For that, I need to limit the choices given to these functions ahead of calling them.为此,我需要在调用这些函数之前限制它们的选择。 This is how I would go about your task:这就是我对您的任务的 go 的方式:

import random
import numpy as np

N = 150

def random_subset(n):
    result = []
    cands = set(range(N))
    for i in range(6):
        result.append(n)                  # Initially, n is the number that must appear in the result
        cands -= set(range(n - 4, n + 5)) # Remove candidates less than 5 away 
        n = random.choice(list(cands))    # Select next number
    return result

result = np.array([random_subset(n) for n in range(N)])
print(result)

Simply put, whenever I add a number n to the result set, I take out of the selection candidates, an environment of the proper size, to ensure no number of a distance of less than 5 can be selected in the future.简单地说,每当我在结果集中添加一个数字n时,我都会从选择候选中取出一个适当大小的环境,以确保将来不会选择距离小于 5 的数字。

The code is not optimized (multiple set to list conversions) but it works (as per my uderstanding).该代码未优化(多个setlist转换),但它可以工作(根据我的理解)。

You can force it to be precisely uniform, if you so desire.如果您愿意,您可以强制它完全一致。

Apologies for the mix of globals and locals, this seemed the most readable.为全局变量和本地变量的混合道歉,这似乎是最易读的。 You would want to rewrite according to how variable your constants are =)您可能希望根据常量的可变性进行重写 =)

import random

SIZE = 150
SAMPLES = 5

def get_samples():
    pool = list(range(SIZE)) * SAMPLES
    random.shuffle(pool)
    items = []
    for i in range(SIZE):
        selection, pool = pool[:SAMPLES], pool[SAMPLES:]
        item = [i] + selection
        items.append(item)
    return items

Then you will have exactly 5 of each (and one more in the leading position, which is a weird data structure).然后你将拥有每个 5 个(在领先的 position 中还有一个,这是一个奇怪的数据结构)。

>>> set(collections.Counter(vv for v in get_samples() for vv in v).values())                                                                      
{6}

The method above does not guarantee the last 5 numbers are unique, in fact, you would expect ~10/150 to have a duplicate.上面的方法不能保证最后 5 个数字是唯一的,事实上,你会期望 ~10/150 有重复。 If that is important, you need to filter your distribution a little more and decide how well you value tight uniformity, duplicates, etc.如果这很重要,您需要对您的分布进行更多过滤,并确定您对紧密一致性、重复等的重视程度。

If your numbers are approximately what you gave above, you also can patch up the results (fairly) and hope to avoid long search times (not the case for SAMPLES sizes closer to OPTIONS size)如果您的数字与您在上面给出的大致相同,您还可以(公平地)修补结果并希望避免较长的搜索时间(对于更接近OPTIONS大小的SAMPLES大小不是这种情况)

def get_samples():
    pool = list(range(SIZE)) * SAMPLES
    random.shuffle(pool)
    i = 0
    while i < len(pool):
        if i % SAMPLES == 0:
            seen = set()
        v = pool[i]
        if v in seen:  # swap
            dst = random.choice(range(SIZE))
            pool[dst], pool[i] = pool[i], pool[dst]
            i = dst - dst % SAMPLES  # Restart from swapped segment
        else:
            seen.add(v)
            i += 1
    items = []
    for i in range(SIZE):
        selection, pool = pool[:SAMPLES], pool[SAMPLES:]
        assert len(set(selection)) == SAMPLES, selection
        item = [i] + selection
        items.append(item)
    return items

This will typically take less than 5 passes through to clean up any duplicates, and should leave all arrangements satisfying your conditions equally likely.这通常需要不到 5 次通过来清理任何重复项,并且应该使所有安排同样可能满足您的条件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM