[英]How can i shuffle a endless generator?
假设我有一个包含大量元素的生成器。
实际上,我想制作一个数字表达式机器:
它产生所有可能的数字表达式:
oam = {'sub': [float, float], 'add':[float, float], 'ws_corr':[float, float, int], 'ws_rank':[float, int]}
rm = {int: list(map(str, [1,2,3,4])), float:[o for o in oam] + ['lp', 'ap', 'vol']}
def gen_exp(depth):
if depth > 0:
for op in oam:
for op_terms in it.product(*[rm[ret_type] for ret_type in oam[op]]):
for op_term in it.product(*[[op_term] if op_term not in oam else gen_exp(depth-1) for op_term in op_terms]):
yield '%s(%s)'%(op, ', '.join(map(str, op_term)))
print(list(gen_exp(1))) # ['sub(lp, ap)', 'sub(lp, vol)', 'sub(ap, lp)', 'sub(ap, vol)', 'sub(vol, lp)', 'sub(vol, ap)', 'add(lp, ap)', 'add(lp, vol)', 'add(ap, lp)', 'add(ap, vol)', 'add(vol, lp)', 'add(vol, ap)', 'ws_corr(lp, ap, 1)', 'ws_corr(lp, ap, 2)', 'ws_corr(lp, ap, 3)', 'ws_corr(lp, ap, 4)', 'ws_corr(lp, vol, 1)', 'ws_corr(lp, vol, 2)', 'ws_corr(lp, vol, 3)', 'ws_corr(lp, vol, 4)', 'ws_corr(ap, lp, 1)', 'ws_corr(ap, lp, 2)', 'ws_corr(ap, lp, 3)', 'ws_corr(ap, lp, 4)', 'ws_corr(ap, vol, 1)', 'ws_corr(ap, vol, 2)', 'ws_corr(ap, vol, 3)', 'ws_corr(ap, vol, 4)', 'ws_corr(vol, lp, 1)', 'ws_corr(vol, lp, 2)', 'ws_corr(vol, lp, 3)', 'ws_corr(vol, lp, 4)', 'ws_corr(vol, ap, 1)', 'ws_corr(vol, ap, 2)', 'ws_corr(vol, ap, 3)', 'ws_corr(vol, ap, 4)', 'ws_rank(lp, 1)', 'ws_rank(lp, 2)', 'ws_rank(lp, 3)', 'ws_rank(lp, 4)', 'ws_rank(ap, 1)', 'ws_rank(ap, 2)', 'ws_rank(ap, 3)', 'ws_rank(ap, 4)', 'ws_rank(vol, 1)', 'ws_rank(vol, 2)', 'ws_rank(vol, 3)', 'ws_rank(vol, 4)']
这个生成器非常庞大,当深度为 2 时,expr num 大于 10^8 所以,可以认为是一个无穷大的生成器。
如您所见,生成器是通过类似 for 循环的方法构建的。
所以,问题是,它太有序了。
我希望它更随机(因为相似的表达式可能有相似的结果,我希望它首先访问不同的),但需要保持完整性。
那么,我该如何洗牌呢?
一种想法是使用islice
将生成器的初始段分割为等待提供的项目池,将其打乱,并从中返回项目,定期将更多项目添加到池中。
概念证明:
import itertools,random
def quasi_permute(gen,pool_size):
more = True
pool = list(itertools.islice(gen,pool_size))
random.shuffle(pool)
while(len(pool) > 0):
yield pool.pop()
if len(pool) <= pool_size//2 and more:
more_items = list(itertools.islice(gen,pool_size//2))
pool.extend(more_items)
random.shuffle(pool)
if len(more_items) < pool_size//2:
more = False
#test:
def genrange(n):
yield from range(n)
print(list(quasi_permute(genrange(10),4)))
#typical output: [2, 0, 3, 5, 4, 6, 8, 9, 1, 7]
另一个测试看起来像:
>>> p = quasi_permute(genrange(10**8),10**5)
>>> list(itertools.islice(p,10))
[79324, 80142, 42905, 67430, 26030, 31212, 77972, 71626, 61142, 45597]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.