如何从 Python 中已知百分比的列表中选择项目

Question

I wish to select a random word from a list where the is a known chance for each word, for example:我希望从列表中选择一个随机单词，其中每个单词的概率是已知的，例如：

Fruit with Probability概率果实

Orange 0.10 Apple 0.05 Mango 0.15 etc橙 0.10 苹果 0.05 芒果 0.15 等

How would be the best way of implementing this?实现这一点的最佳方式是什么？ The actual list I will take from is up to 100 items longs and the % do not all tally to 100 % they do fall short to account for the items that had a really low chance of occurrence.我要从中获取的实际列表最多有 100 个项目，并且百分比并不全部达到 100%，因为它们确实不足，以说明发生几率非常低的项目。 I would ideally like to take this from a CSV which is where I store this data.理想情况下，我想从 CSV 中获取它，这是我存储这些数据的地方。 This is not a time critical task.这不是一项时间紧迫的任务。

Thank you for any advice on how best to proceed.感谢您提供有关如何最好地进行的任何建议。

Answer 1

You can pick items with weighted probabilities if you assign each item a number range proportional to its probability, pick a random number between zero and the sum of the ranges and find what item matches it.如果您为每个项目分配一个与其概率成正比的数字范围，在零和范围总和之间选择一个随机数并找到与它匹配的项目，则您可以选择具有加权概率的项目。 The following class does exactly that:下面的类正是这样做的：

from random import random

class WeightedChoice(object):
    def __init__(self, weights):
        """Pick items with weighted probabilities.

            weights
                a sequence of tuples of item and it's weight.
        """
        self._total_weight = 0.
        self._item_levels = []
        for item, weight in weights:
            self._total_weight += weight
            self._item_levels.append((self._total_weight, item))

    def pick(self):
        pick = self._total_weight * random()
        for level, item in self._item_levels:
            if level >= pick:
                return item

You can then load the CSV file with the csv module and feed it to the WeightedChoice class:然后，您可以使用csv模块加载 CSV 文件并将其提供给WeightedChoice类：

import csv

weighed_items = [(item,float(weight)) for item,weight in csv.reader(open('file.csv'))]
picker = WeightedChoice(weighed_items)
print(picker.pick())

Answer 2

What you want is to draw from a multinomial distribution .您想要的是从多项分布中提取。 Assuming you have two lists of items and probabilities, and the probabilities sum to 1 (if not, just add some default value to cover the extra):假设您有两个项目和概率列表，并且概率总和为 1（如果不是，只需添加一些默认值来覆盖额外的值）：

def choose(items,chances):
    import random
    p = chances[0]
    x = random.random()
    i = 0
    while x > p :
        i = i + 1
        p = p + chances[i]
    return items[i]

Answer 3

lst = [ ('Orange', 0.10), ('Apple', 0.05), ('Mango', 0.15), ('etc', 0.69) ]

x = 0.0
lst2 = []
for fruit, chance in lst:
    tup = (x, fruit)
    lst2.append(tup)
    x += chance

tup = (x, None)
lst2.append(tup)

import random

def pick_one(lst2):
    if lst2[0][1] is None:
        raise ValueError, "no valid values to choose"
    while True:
        r = random.random()
        for x, fruit in reversed(lst2):
            if x <= r:
                if fruit is None:
                    break  # try again with a different random value
                else:
                    return fruit

pick_one(lst2)

This builds a new list, with ascending values representing the range of values that choose a fruit;这将构建一个新列表，其中升序值表示选择水果的值范围； then pick_one() walks backward down the list, looking for a value that is <= the current random value.然后 pick_one() 沿着列表向后走，寻找 <= 当前随机值的值。 We put a "sentinel" value on the end of the list;我们在列表的末尾放置了一个“哨兵”值； if the values don't reach 1.0, there is a chance of a random value that shouldn't match anything, and it will match the sentinel value and then be rejected.如果值未达到 1.0，则有可能出现不应该匹配任何内容的随机值，它将匹配标记值然后被拒绝。 random.random() returns a random value in the range [0.0, 1.0) so it is certain to match something in the list eventually. random.random() 返回 [0.0, 1.0) 范围内的随机值，因此最终肯定会匹配列表中的某些内容。

The nice thing here is that you should be able to have one value with a 0.000001 chance of matching, and it should actually match with that frequency;这里的好处是，您应该能够有一个匹配机会为 0.000001 的值，并且它实际上应该与该频率匹配； the other solutions, where you make a list with the items repeated and just use random.choice() to choose one, would require a list with a million items in it to handle this case.在其他解决方案中，您制作一个包含重复项的列表并仅使用 random.choice() 来选择一个列表，则需要一个包含一百万项的列表来处理这种情况。

Answer 4

lst = [ ('Orange', 0.10), ('Apple', 0.05), ('Mango', 0.15), ('etc', 0.69) ]

x = 0.0
lst2 = []
for fruit, chance in lst:
    low = x
    high = x + chance
    tup = (low, high, fruit)
    lst2.append(tup)
    x += chance

if x > 1.0:
    raise ValueError, "chances add up to more than 100%"

low = x
high = 1.0
tup = (low, high, None)
lst2.append(tup)

import random

def pick_one(lst2):
    if lst2[0][2] is None:
        raise ValueError, "no valid values to choose"
    while True:
        r = random.random()
        for low, high, fruit in lst2:
            if low <= r < high:
                if fruit is None:
                    break  # try again with a different random value
                else:
                    return fruit

pick_one(lst2)


# test it 10,000 times
d = {}
for i in xrange(10000):
    x = pick_one(lst2)
    if x in d:
        d[x] += 1
    else:
        d[x] = 1

I think this is a little clearer.我觉得这更清楚一些。 Instead of a tricky way of representing ranges as ascending values, we just keep ranges.我们只是保留范围，而不是将范围表示为升序值的棘手方法。 Because we are testing ranges, we can simply walk forward through the lst2 values;因为我们正在测试范围，所以我们可以简单地向前遍历 lst2 值； no need to use reversed() .无需使用reversed() 。

Answer 5

from numpy.random import multinomial
import numpy as np

def pickone(dist):
    return np.where(multinomial(1, dist) == 1)[0][0]

if __name__ == '__main__':
    lst = [ ('Orange', 0.10), ('Apple', 0.05), ('Mango', 0.15), ('etc', 0.70) ]
    dist = [p[1] for p in lst]
    
    N = 10000
    draws = np.array([pickone(dist) for i in range(N)], dtype=int)
    hist = np.histogram(draws, bins=[i for i in range(len(dist)+1)])[0]
    for i in range(len(lst)):
        print(f'{lst[i]} {hist[i]/N}')

Answer 6

One solution is to normalize the probabilities to integers and then repeat each element once per value (eg a list with 2 Oranges, 1 Apple, 3 Mangos).一种解决方案是将概率归一化为整数，然后对每个值重复每个元素一次（例如，包含 2 个橙子、1 个苹果、3 个芒果的列表）。 This is incredibly easy to do ( from random import choice ).这非常容易做到（ from random import choice ）。 If that is not practical, try the code here .如果这不切实际，请尝试此处的代码。

Answer 7

import random
d= {'orange': 0.10, 'mango': 0.15, 'apple': 0.05}
weightedArray = []
for k in d:
  weightedArray+=[k]*int(d[k]*100)
random.choice(weightedArray)

EDITS编辑

This is essentially what Brian said above.这基本上就是布赖恩上面所说的。

如何从 Python 中已知百分比的列表中选择项目

问题描述

7 个解决方案

解决方案1
2 已采纳 2009-10-12 19:22:28

解决方案2
2 2009-10-12 19:43:30

解决方案3
1 2009-10-12 19:05:42

解决方案4
0 2009-10-12 19:39:23

解决方案5
0 2020-10-22 15:50:22

解决方案6
-1 2009-10-12 18:55:22

解决方案7
-1 2009-10-12 18:59:24

如何从 Python 中已知百分比的列表中选择项目

问题描述

7 个解决方案

解决方案1 2 已采纳 2009-10-12 19:22:28

解决方案2 2 2009-10-12 19:43:30

解决方案3 1 2009-10-12 19:05:42

解决方案4 0 2009-10-12 19:39:23

解决方案5 0 2020-10-22 15:50:22

解决方案6 -1 2009-10-12 18:55:22

解决方案7 -1 2009-10-12 18:59:24

解决方案1
2 已采纳 2009-10-12 19:22:28

解决方案2
2 2009-10-12 19:43:30

解决方案3
1 2009-10-12 19:05:42

解决方案4
0 2009-10-12 19:39:23

解决方案5
0 2020-10-22 15:50:22

解决方案6
-1 2009-10-12 18:55:22

解决方案7
-1 2009-10-12 18:59:24