在python中生成随机的weigted字符串文件

Question

i'm trying to generate a string from characters of ['A','B','C','D','E'] with length of 3900, and every character should have probability of: {'A':0.1, 'B':0.3, 'C':0.3, 'D':0.1, 'E':0.2 } in this string i wrote the following code: 我正在尝试从长度为3900的['A'，'B'，'C'，'D'，'E']字符生成字符串，并且每个字符都应具有以下概率：{'A'： 0.1，'B'：0.3，'C'：0.3，'D'：0.1，'E'：0.2}在此字符串中，我编写了以下代码：

from random import random
from bisect import bisect

def weighted_choice(choices):
    values, weights = zip(*choices)
    total = 0
    cum_weights = []
    for w in weights:
        total += w
        cum_weights.append(total)
    x = random() * total
    i = bisect(cum_weights, x)
    return values[i]
string_ = ''
for i in range(0,3900):
    string_ = string_ + weighted_choice([("A",10), ("B",30), ("C",30),("D",10),("E",20)])

with open("rand_file","w") as f:
        f.write(string_)

but it doesn't generate the string(file) based on the probabilities. 但它不会根据概率生成字符串（文件）。 it generates with probabilities like this: 它生成的概率如下：

C 0.2500264583 
B 0.2499284457 
E 0.1666428313 
D 0.0833782424 
A 0.0833758065

probability cause the for loop runs separately every time, without considering previous results. 原因for循环每次都单独运行，而不考虑先前的结果。

any help please to solve this problem ? 有什么帮助请解决这个问题？

Answer 1

If you just use the list ['A','B','B','B','C','C','C','D','E','E'] and choose an item from it at random, you can get rid off all that weighting stuff in your code totally, and the weighting will be built in. 如果您仅使用列表['A','B','B','B','C','C','C','D','E','E']然后选择一个随机删除它，您可以完全摆脱代码中所有的加权内容，并且将内置加权。

You can see that in the following example (yes, I don't doubt it could be written better but it's only meant to be a proof-of-concept, not production-ready, pure-as-snow-white code): 您可以在以下示例中看到它（是的，我毫不怀疑它可以写得更好，但这仅是作为概念证明，而不是可用于生产的纯白雪皑皑的代码）：

from random import random, seed

def choice(lst):
    return lst[int(random() * len(lst))];

seed()

(a, b, c, d, e, t) = (0, 0, 0, 0, 0, 0)

for i in range(1000):
    x = choice('ABBBCCCDEE')
    if (x == 'A'): a += 1
    if (x == 'B'): b += 1
    if (x == 'C'): c += 1
    if (x == 'D'): d += 1
    if (x == 'E'): e += 1
    t += 1

print ("a =", a, "which is", a * 100 / t, "%")
print ("b =", b, "which is", b * 100 / t, "%")
print ("c =", c, "which is", c * 100 / t, "%")
print ("d =", d, "which is", d * 100 / t, "%")
print ("e =", e, "which is", e * 100 / t, "%")

with the output matching (roughly) the desired distribution: 输出匹配（大致）所需的分布：

a = 101 which is 10.1 %
b = 297 which is 29.7 %
c = 299 which is 29.9 %
d = 102 which is 10.2 %
e = 201 which is 20.1 %

Now that's obviously going to be annoying if your distribution is 99.9% A and 0.1% B (it'll be a rather long string passed to choice ) but this should be adequate for the distribution you have. 现在，如果您的分布是99.9％ A和0.1％ B （这将是一个相当长的字符串传递给choice ），那显然会很烦人，但这对于您的分布来说应该足够了。

Answer 2

You can generate all letters according to the weighting, then randomly shuffle them and finally join them. 您可以根据权重生成所有字母，然后随机洗牌，最后加入它们。 Something like: 就像是：

from random import shuffle
N = 3900 # the string length
doc = {'A':0.1, 'B':0.3, 'C':0.3, 'D':0.1, 'E':0.2 } #weights
letters = []
for key in doc.keys():
    m = int(doc[key] * N) #generate correct number of letter
    letters.append(list(key * m))

letters = [item for sublist in letters for item in sublist] # flatten the list
shuffle(letters) # shuffle all letters randomly
result = ''.join(letters) # join all letter to make one string

print(len(result))
# 3900

Answer 3

this is actually the same as paxdiablo's solution, except a little more general (for your simple example, his solution is better. +1): 这实际上与paxdiablo的解决方案相同，只不过有些通用（对于您的简单示例，他的解决方案更好。+1）：

import random

choice = [("A",10), ("B",30), ("C",30),("D",10),("E",20)]
choose_from = ''.join(x * letter for letter, x in choice)

print(choose_from)
#  AAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDetc...

print(random.choice(choose_from))

Answer 4

This is my solution hope it helps at least a bit: import random letters = [1,2,2,2,3,3,3,4,5,5] #Each number represents a letter! for n in range(0,3900): output = letters[random.randint(0,9)] if output == 1: A += 1 这是我的解决方案，希望至少对您有所帮助： import random letters = [1,2,2,2,3,3,3,4,5,5] #Each number represents a letter! for n in range(0,3900): output = letters[random.randint(0,9)] if output == 1: A += 1 import random letters = [1,2,2,2,3,3,3,4,5,5] #Each number represents a letter! for n in range(0,3900): output = letters[random.randint(0,9)] if output == 1: A += 1

And obviously you can add more if statements but i'm not sure if this is what you where asking 显然，您可以添加更多的if语句，但是我不确定这是否是您要查询的内容

在python中生成随机的weigted字符串文件

问题描述

4 个解决方案

解决方案1
3 2016-12-21 07:56:02

解决方案2
1 2016-12-21 07:58:37

解决方案3
0 2016-12-21 07:59:43

解决方案4
0 2016-12-21 20:37:25

在python中生成随机的weigted字符串文件

问题描述

4 个解决方案

解决方案1 3 2016-12-21 07:56:02

解决方案2 1 2016-12-21 07:58:37

解决方案3 0 2016-12-21 07:59:43

解决方案4 0 2016-12-21 20:37:25

解决方案1
3 2016-12-21 07:56:02

解决方案2
1 2016-12-21 07:58:37

解决方案3
0 2016-12-21 07:59:43

解决方案4
0 2016-12-21 20:37:25