隨機選擇 cum_weights

Question

請我對此有更多的了解，我真的不太了解。 以此為例

import random

my_list =       [9999, 45, 63, 19, 89, 5, 72]
cum_w =   [1, 9, 10, 9, 2, 12, 7]
d_rand = random.choices(my_list, cum_weights=cum_w, k=7)
sum = 0
for idx, i in enumerate(cum_w):
    if idx == 0:
        for i in cum_w: sum += i
    print(f"cum_weight for {my_list[idx]}\t= {i/sum}\tRandom={random.choices(my_list, cum_weights=cum_w, k=7)}")

下面是output

cum_weight for 9999     = 0.14  Random=[45, 45, 9999, 45, 45, 9999, 45]
cum_weight for 45       = 0.18  Random=[45, 45, 45, 45, 9999, 45, 45]
cum_weight for 63       = 0.2   Random=[45, 45, 45, 9999, 9999, 9999, 45]
cum_weight for 19       = 0.18  Random=[45, 45, 45, 45, 45, 45, 9999]
cum_weight for 89       = 0.04  Random=[9999, 45, 45, 45, 45, 9999, 45]
cum_weight for 5        = 0.24  Random=[45, 45, 45, 45, 45, 45, 45]
cum_weight for 72       = 0.14  Random=[45, 45, 9999, 45, 45, 45, 45]

9(cum_w[1] 和 cum_w[3]) 的概率為 0.18。 為什么 45(9) 如此頻繁地出現？

我已經閱讀了 random.choices 文檔，但並沒有真正理解我。
cum_weights 是如何工作的？ 拜托，我需要對此有深入的了解。

Answer 1

您問“為什么 45(9) 如此頻繁地出現？” 和“cum_weights 是如何工作的？” 解決第二個問題將解釋第一個問題。 請注意，以下是用於此類問題的一種方法的實現。 我並不是說這是 python 的實現，它是為了說明所涉及的概念。

讓我們首先看看如果使用累積權重如何生成值，即一個列表，其中每個索引的條目是所有權重的總和，直至並包括當前索引。

import random

# Given cumulative weights, convert them to proportions, then generate U ~ Uniform(0,1)
# random values to use in a linear search to generate values in the correct proportions.
# This is based on the well-known probability result that P{a<=U<=b} = (b - a) for
# 0 <= a < b <= 1.
def gen_cumulative_weighted(values, c_weights):   # values and c_weights must be lists of the same length
    # Convert cumulative weights to probabilities/proportions by dividing by the last value.
    # This yields a list of non-decreasing values between 0 and 1. Note that the last entry 
    # is always 1, so a Uniform(0, 1) random number will *always* be less than or equal to
    # some entry in the list.
    p = [c_weights[i] / c_weights[-1] for i in range(len(c_weights))]
    while True:
        index = 0   # starting from the beginning of the list
        # The following three lines find the first index having the property u <= p[index].
        u = random.random()
        while u > p[index]:
            index += 1
        yield(values[index])    # yield the corresponding value.

正如評論所指出的那樣，權重按最后一個（也是最大的）值縮放，以將它們縮放到范圍 (0,1) 中的一組值。 這些可以被認為是非重疊子范圍的最右邊的端點，每個子范圍的長度等於相應的縮放權重。 （如果不清楚，請在紙上畫出草圖，您應該很快就會看到它。）生成的 Uniform(0,1) 值將落在這些子范圍之一中，並且落入其中的概率等於子范圍的長度根據眾所周知的概率結果。

如果我們有原始權重而不是累積權重，我們所要做的就是將它們轉換為累積權重，然后將工作傳遞給生成器的累積加權版本：

def gen_weighted(values, weights):   # values and weights must be lists of the same length
    cumulative_w = [sum(weights[:i+1]) for i in range(len(weights))]
    return gen_cumulative_weighted(values, cumulative_w)

現在我們准備好使用生成器了：

my_values = [9999, 45, 63, 19, 89, 5, 72]
my_weights = [1, 9, 10, 9, 2, 12, 7]
good_gen = gen_weighted(my_values, my_weights)
print('Passing raw weights to the weighted implementation:')
print([next(good_gen) for _ in range(20)])

這將產生如下結果：

Passing raw weights to the weighted implementation:
[63, 5, 63, 63, 72, 19, 63, 5, 45, 63, 72, 19, 5, 89, 72, 63, 63, 19, 89, 45]

好的，如果我們將原始權重傳遞給算法的累積加權版本會怎樣？ [1, 9, 10, 9, 2, 12, 7]的原始權重除以最后一個值得到縮放，並變為[1/7, 9/7, 10/7, 9/7, 2/7, 12/7, 1] 。 當我們生成u ~ Uniform(0, 1) 並使用它通過縮放權重進行線性搜索時，它將以 1/7 的概率產生索引 0 => 9999，以 6/7 的概率產生索引 1 => 45！ 發生這種情況是因為u總是 ≤ 1，因此總是小於 9/7。 因此，線性搜索永遠不會超過任何 ≥ 1 的縮放權重，這對於您的輸入意味着它只能生成前兩個值，並且權重錯誤。

print('Passing raw weights to the cumulative weighted implementation:')
bad_gen = gen_cumulative_weighted(my_values, my_weights)
print([next(bad_gen) for _ in range(20)])

產生如下結果：

Passing raw weights to the cumulative weighted implementation:
[45, 45, 45, 45, 45, 45, 45, 9999, 45, 9999, 45, 45, 45, 45, 45, 9999, 45, 9999, 45, 45]

隨機選擇 cum_weights

問題描述

1 個解決方案

解決方案1
2 已采納 2022-10-06 23:00:43

隨機選擇 cum_weights

問題描述

1 個解決方案

解決方案1 2 已采納 2022-10-06 23:00:43

解決方案1
2 已采納 2022-10-06 23:00:43