简体   繁体   English

使用python生成随机单词

[英]Random words generate using python

I have a list of words 我有一个单词列表

count=100    
list = ['apple','orange','mango']

for the count above using random function is it possible to select 40% of the time apple, 30% of the time orange and 30% of the time mango? 对于上面使用随机函数的计数,有可能选择40%的苹果时间,30%的橙色时间和30%的时间芒果?

for ex: 对于前:

for the count=100, 40 times apple, 30 times orange and 30 times mango.

this select has to happen randomly 这种选择必须随机发生

Based on an answer to the question about generating discrete random variables with specified weights , you can use numpy.random.choice to get 20 times faster code than with random.choice : 根据关于生成具有指定权重的离散随机变量的问题的答案,您可以使用numpy.random.choice获得比使用random.choice快20倍的代码:

from numpy.random import choice

sample = choice(['apple','orange','mango'], p=[0.4, 0.3, 0.3], size=1000000)

from collections import Counter
print(Counter(sample))

Outputs: 输出:

Counter({'apple': 399778, 'orange': 300317, 'mango': 299905})

Not to mention that it is actually easier than "to build a list in the required proportions and then shuffle it". 更不用说它实际上比“以所需比例建立一个列表然后将其洗牌”更容易。

Also, shuffle would always produce exactly 40% apples, 30% orange and 30% mango, which is not the same as saying "produce a sample of million fruits according to a discrete probability distribution". 此外,洗牌总是产生完全相同的40%苹果,30%,橙色和30%的芒果,这是不一样说“产生根据离散概率分布的百万水果样品”一样。 The latter is what both choice solutions do (and the bisect too). 后者是两种choice解决方案所做的事情(也是bisect )。 As can be seen above, there is about 40% apples, etc., when using numpy . 从上面可以看出,当使用numpy时, 大约有 40%的苹果等。

The easiest way is to build a list in the required proportions and then shuffle it. 最简单的方法是以所需的比例构建一个列表,然后将其洗牌。

>>> import random
>>> result = ['apple'] * 40 + ['orange'] * 30 + ['mango'] * 30
>>> random.shuffle(result)

Edit for the new requirement that the count is really 1,000,000: 编辑计数实际为1,000,000的新要求:

>>> count = 1000000
>>> pool = ['apple'] * 4 + ['orange'] * 3 + ['mango'] * 3
>>> for i in xrange(count):
        print random.choice(pool)

A slower but more general alternative approach is to bisect a cumulative probability distribution : 较慢但更通用的替代方法是将累积概率分布 平分

>>> import bisect
>>> choices = ['apple', 'orange', 'mango']
>>> cum_prob_dist = [0.4, 0.7]
>>> for i in xrange(count):
        print choices[bisect.bisect(cum_prob_dist, random.random())]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM