简体   繁体   English

在python中生成随机的weigted字符串文件

[英]generate random weigted string file in python

i'm trying to generate a string from characters of ['A','B','C','D','E'] with length of 3900, and every character should have probability of: {'A':0.1, 'B':0.3, 'C':0.3, 'D':0.1, 'E':0.2 } in this string i wrote the following code: 我正在尝试从长度为3900的['A','B','C','D','E']字符生成字符串,并且每个字符都应具有以下概率:{'A': 0.1,'B':0.3,'C':0.3,'D':0.1,'E':0.2}在此字符串中,我编写了以下代码:

from random import random
from bisect import bisect

def weighted_choice(choices):
    values, weights = zip(*choices)
    total = 0
    cum_weights = []
    for w in weights:
        total += w
        cum_weights.append(total)
    x = random() * total
    i = bisect(cum_weights, x)
    return values[i]
string_ = ''
for i in range(0,3900):
    string_ = string_ + weighted_choice([("A",10), ("B",30), ("C",30),("D",10),("E",20)])

with open("rand_file","w") as f:
        f.write(string_)

but it doesn't generate the string(file) based on the probabilities. 但它不会根据概率生成字符串(文件)。 it generates with probabilities like this: 它生成的概率如下:

C 0.2500264583 
B 0.2499284457 
E 0.1666428313 
D 0.0833782424 
A 0.0833758065 

probability cause the for loop runs separately every time, without considering previous results. 原因for循环每次都单独运行,而不考虑先前的结果。

any help please to solve this problem ? 有什么帮助请解决这个问题?

If you just use the list ['A','B','B','B','C','C','C','D','E','E'] and choose an item from it at random, you can get rid off all that weighting stuff in your code totally, and the weighting will be built in. 如果您仅使用列表['A','B','B','B','C','C','C','D','E','E']然后选择一个随机删除它,您可以完全摆脱代码中所有的加权内容,并且将内置加权。

You can see that in the following example (yes, I don't doubt it could be written better but it's only meant to be a proof-of-concept, not production-ready, pure-as-snow-white code): 您可以在以下示例中看到它(是的,我毫不怀疑它可以写得更好,但这仅是作为概念证明,而不是可用于生产的纯白雪皑皑的代码):

from random import random, seed

def choice(lst):
    return lst[int(random() * len(lst))];

seed()

(a, b, c, d, e, t) = (0, 0, 0, 0, 0, 0)

for i in range(1000):
    x = choice('ABBBCCCDEE')
    if (x == 'A'): a += 1
    if (x == 'B'): b += 1
    if (x == 'C'): c += 1
    if (x == 'D'): d += 1
    if (x == 'E'): e += 1
    t += 1

print ("a =", a, "which is", a * 100 / t, "%")
print ("b =", b, "which is", b * 100 / t, "%")
print ("c =", c, "which is", c * 100 / t, "%")
print ("d =", d, "which is", d * 100 / t, "%")
print ("e =", e, "which is", e * 100 / t, "%")

with the output matching (roughly) the desired distribution: 输出匹配(大致)所需的分布:

a = 101 which is 10.1 %
b = 297 which is 29.7 %
c = 299 which is 29.9 %
d = 102 which is 10.2 %
e = 201 which is 20.1 %

Now that's obviously going to be annoying if your distribution is 99.9% A and 0.1% B (it'll be a rather long string passed to choice ) but this should be adequate for the distribution you have. 现在,如果您的分布是99.9% A和0.1% B (这将是一个相当长的字符串传递给choice ),那显然会很烦人,但这对于您的分布来说应该足够了。

You can generate all letters according to the weighting, then randomly shuffle them and finally join them. 您可以根据权重生成所有字母,然后随机洗牌,最后加入它们。 Something like: 就像是:

from random import shuffle
N = 3900 # the string length
doc = {'A':0.1, 'B':0.3, 'C':0.3, 'D':0.1, 'E':0.2 } #weights
letters = []
for key in doc.keys():
    m = int(doc[key] * N) #generate correct number of letter
    letters.append(list(key * m))

letters = [item for sublist in letters for item in sublist] # flatten the list
shuffle(letters) # shuffle all letters randomly
result = ''.join(letters) # join all letter to make one string

print(len(result))
# 3900

this is actually the same as paxdiablo's solution, except a little more general (for your simple example, his solution is better. +1): 这实际上与paxdiablo的解决方案相同,只不过有些通用(对于您的简单示例,他的解决方案更好。+1):

import random

choice = [("A",10), ("B",30), ("C",30),("D",10),("E",20)]
choose_from = ''.join(x * letter for letter, x in choice)

print(choose_from)
#  AAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDetc...

print(random.choice(choose_from))

This is my solution hope it helps at least a bit: import random letters = [1,2,2,2,3,3,3,4,5,5] #Each number represents a letter! for n in range(0,3900): output = letters[random.randint(0,9)] if output == 1: A += 1 这是我的解决方案,希望至少对您有所帮助: import random letters = [1,2,2,2,3,3,3,4,5,5] #Each number represents a letter! for n in range(0,3900): output = letters[random.randint(0,9)] if output == 1: A += 1 import random letters = [1,2,2,2,3,3,3,4,5,5] #Each number represents a letter! for n in range(0,3900): output = letters[random.randint(0,9)] if output == 1: A += 1

And obviously you can add more if statements but i'm not sure if this is what you where asking 显然,您可以添加更多的if语句,但是我不确定这是否是您要查询的内容

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM