[英]generate random weigted string file in python
i'm trying to generate a string from characters of ['A','B','C','D','E'] with length of 3900, and every character should have probability of: {'A':0.1, 'B':0.3, 'C':0.3, 'D':0.1, 'E':0.2 } in this string i wrote the following code: 我正在尝试从长度为3900的['A','B','C','D','E']字符生成字符串,并且每个字符都应具有以下概率:{'A': 0.1,'B':0.3,'C':0.3,'D':0.1,'E':0.2}在此字符串中,我编写了以下代码:
from random import random
from bisect import bisect
def weighted_choice(choices):
values, weights = zip(*choices)
total = 0
cum_weights = []
for w in weights:
total += w
cum_weights.append(total)
x = random() * total
i = bisect(cum_weights, x)
return values[i]
string_ = ''
for i in range(0,3900):
string_ = string_ + weighted_choice([("A",10), ("B",30), ("C",30),("D",10),("E",20)])
with open("rand_file","w") as f:
f.write(string_)
but it doesn't generate the string(file) based on the probabilities. 但它不会根据概率生成字符串(文件)。 it generates with probabilities like this:
它生成的概率如下:
C 0.2500264583
B 0.2499284457
E 0.1666428313
D 0.0833782424
A 0.0833758065
probability cause the for loop runs separately every time, without considering previous results. 原因for循环每次都单独运行,而不考虑先前的结果。
any help please to solve this problem ? 有什么帮助请解决这个问题?
If you just use the list ['A','B','B','B','C','C','C','D','E','E']
and choose an item from it at random, you can get rid off all that weighting stuff in your code totally, and the weighting will be built in. 如果您仅使用列表
['A','B','B','B','C','C','C','D','E','E']
然后选择一个随机删除它,您可以完全摆脱代码中所有的加权内容,并且将内置加权。
You can see that in the following example (yes, I don't doubt it could be written better but it's only meant to be a proof-of-concept, not production-ready, pure-as-snow-white code): 您可以在以下示例中看到它(是的,我毫不怀疑它可以写得更好,但这仅是作为概念证明,而不是可用于生产的纯白雪皑皑的代码):
from random import random, seed
def choice(lst):
return lst[int(random() * len(lst))];
seed()
(a, b, c, d, e, t) = (0, 0, 0, 0, 0, 0)
for i in range(1000):
x = choice('ABBBCCCDEE')
if (x == 'A'): a += 1
if (x == 'B'): b += 1
if (x == 'C'): c += 1
if (x == 'D'): d += 1
if (x == 'E'): e += 1
t += 1
print ("a =", a, "which is", a * 100 / t, "%")
print ("b =", b, "which is", b * 100 / t, "%")
print ("c =", c, "which is", c * 100 / t, "%")
print ("d =", d, "which is", d * 100 / t, "%")
print ("e =", e, "which is", e * 100 / t, "%")
with the output matching (roughly) the desired distribution: 输出匹配(大致)所需的分布:
a = 101 which is 10.1 %
b = 297 which is 29.7 %
c = 299 which is 29.9 %
d = 102 which is 10.2 %
e = 201 which is 20.1 %
Now that's obviously going to be annoying if your distribution is 99.9% A
and 0.1% B
(it'll be a rather long string passed to choice
) but this should be adequate for the distribution you have. 现在,如果您的分布是99.9%
A
和0.1% B
(这将是一个相当长的字符串传递给choice
),那显然会很烦人,但这对于您的分布来说应该足够了。
You can generate all letters according to the weighting, then randomly shuffle them and finally join them. 您可以根据权重生成所有字母,然后随机洗牌,最后加入它们。 Something like:
就像是:
from random import shuffle
N = 3900 # the string length
doc = {'A':0.1, 'B':0.3, 'C':0.3, 'D':0.1, 'E':0.2 } #weights
letters = []
for key in doc.keys():
m = int(doc[key] * N) #generate correct number of letter
letters.append(list(key * m))
letters = [item for sublist in letters for item in sublist] # flatten the list
shuffle(letters) # shuffle all letters randomly
result = ''.join(letters) # join all letter to make one string
print(len(result))
# 3900
this is actually the same as paxdiablo's solution, except a little more general (for your simple example, his solution is better. +1): 这实际上与paxdiablo的解决方案相同,只不过有些通用(对于您的简单示例,他的解决方案更好。+1):
import random
choice = [("A",10), ("B",30), ("C",30),("D",10),("E",20)]
choose_from = ''.join(x * letter for letter, x in choice)
print(choose_from)
# AAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDetc...
print(random.choice(choose_from))
This is my solution hope it helps at least a bit: import random letters = [1,2,2,2,3,3,3,4,5,5] #Each number represents a letter! for n in range(0,3900): output = letters[random.randint(0,9)] if output == 1: A += 1
这是我的解决方案,希望至少对您有所帮助:
import random letters = [1,2,2,2,3,3,3,4,5,5] #Each number represents a letter! for n in range(0,3900): output = letters[random.randint(0,9)] if output == 1: A += 1
import random letters = [1,2,2,2,3,3,3,4,5,5] #Each number represents a letter! for n in range(0,3900): output = letters[random.randint(0,9)] if output == 1: A += 1
And obviously you can add more if statements but i'm not sure if this is what you where asking 显然,您可以添加更多的if语句,但是我不确定这是否是您要查询的内容
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.