如何確保生成的數字列表遵循均勻分布

Question

我有一個從 0 到 149 的 150 個數字的列表。我想使用具有 150 次迭代的 for 循環來生成 150 個 6 個數字的列表，這樣，t 在每次迭代k中，包括數字k以及 5不同的隨機數。 例如：

S0 = [0, r1, r2, r3, r4, r5] # r1, r2,..., r5 are random numbers between 0 and 150
S1 = [1, r1', r2', r3', r4', r5'] # r1', r2',..., r5' are new random numbers between 0 and 150
...
S149 = [149, r1'', r2'', r3'', r4'', r5'']

此外，每個列表中的數字必須不同，並且最小距離為 5。這是我正在使用的代碼：

import random
import numpy as np

final_list = []
for k in range(150):
    S = [k]
    for it in range(5):
        domain = [ele for ele in range(150) if ele not in S]
        d = 0
        x = k
        while d < 5:
            d = np.Infinity
            x = random.sample(domain, 1)[0]
            for ch in S:
                if np.abs(ch - x) < d:
                    d = np.abs(ch - x)
        S.append(x)
    final_list.append(S)

Output：

[[0, 149, 32, 52, 39, 126],
 [1, 63, 16, 50, 141, 79],
 [2, 62, 21, 42, 35, 71],
...
 [147, 73, 38, 115, 82, 47],
 [148, 5, 78, 115, 140, 43],
 [149, 36, 3, 15, 99, 23]]

現在，代碼正在運行，但我想知道是否可以強制每個數字在所有迭代中的重復次數大致相同。 例如，使用前面的代碼后，這個 plot 表示每個數字在生成的列表中出現了多少次：

如您所見，有些數字出現了 10 次以上，而有些數字只出現了 2 次。 是否可以減少這種變化水平，以便可以將這個 plot 近似為均勻分布？ 謝謝。

Answer 1

首先，我不確定您關於當前結果不是均勻分布的斷言是否一定正確。 對我來說，嘗試檢查多次重復該過程的直方圖似乎是謹慎的，而不僅僅是一次。

我不是統計學家，但是當我想近似均勻分布（並假設random函數提供均勻分布）時，我嘗試做的是簡單地接受random函數返回的所有結果。 為此，我需要在調用這些函數之前限制它們的選擇。 這就是我對您的任務的 go 的方式：

import random
import numpy as np

N = 150

def random_subset(n):
    result = []
    cands = set(range(N))
    for i in range(6):
        result.append(n)                  # Initially, n is the number that must appear in the result
        cands -= set(range(n - 4, n + 5)) # Remove candidates less than 5 away 
        n = random.choice(list(cands))    # Select next number
    return result

result = np.array([random_subset(n) for n in range(N)])
print(result)

簡單地說，每當我在結果集中添加一個數字n時，我都會從選擇候選中取出一個適當大小的環境，以確保將來不會選擇距離小於 5 的數字。

該代碼未優化（多個set以list轉換），但它可以工作（根據我的理解）。

Answer 2

如果您願意，您可以強制它完全一致。

為全局變量和本地變量的混合道歉，這似乎是最易讀的。 您可能希望根據常量的可變性進行重寫 =)

import random

SIZE = 150
SAMPLES = 5

def get_samples():
    pool = list(range(SIZE)) * SAMPLES
    random.shuffle(pool)
    items = []
    for i in range(SIZE):
        selection, pool = pool[:SAMPLES], pool[SAMPLES:]
        item = [i] + selection
        items.append(item)
    return items

然后你將擁有每個 5 個（在領先的 position 中還有一個，這是一個奇怪的數據結構）。

>>> set(collections.Counter(vv for v in get_samples() for vv in v).values())                                                                      
{6}

上面的方法不能保證最后 5 個數字是唯一的，事實上，你會期望 ~10/150 有重復。 如果這很重要，您需要對您的分布進行更多過濾，並確定您對緊密一致性、重復等的重視程度。

如果您的數字與您在上面給出的大致相同，您還可以（公平地）修補結果並希望避免較長的搜索時間（對於更接近OPTIONS大小的SAMPLES大小不是這種情況）

def get_samples():
    pool = list(range(SIZE)) * SAMPLES
    random.shuffle(pool)
    i = 0
    while i < len(pool):
        if i % SAMPLES == 0:
            seen = set()
        v = pool[i]
        if v in seen:  # swap
            dst = random.choice(range(SIZE))
            pool[dst], pool[i] = pool[i], pool[dst]
            i = dst - dst % SAMPLES  # Restart from swapped segment
        else:
            seen.add(v)
            i += 1
    items = []
    for i in range(SIZE):
        selection, pool = pool[:SAMPLES], pool[SAMPLES:]
        assert len(set(selection)) == SAMPLES, selection
        item = [i] + selection
        items.append(item)
    return items

這通常需要不到 5 次通過來清理任何重復項，並且應該使所有安排同樣可能滿足您的條件。

如何確保生成的數字列表遵循均勻分布

問題描述

2 個解決方案

解決方案1
1 2020-06-13 09:52:26

解決方案2
0 2020-06-13 22:36:47

如何確保生成的數字列表遵循均勻分布

問題描述

2 個解決方案

解決方案1 1 2020-06-13 09:52:26

解決方案2 0 2020-06-13 22:36:47

解決方案1
1 2020-06-13 09:52:26

解決方案2
0 2020-06-13 22:36:47