[英]Random combinations without duplicates in Python
Suppose you have an iterable t
of size n
.假设您有一个大小为
n
的可迭代t
。 You want to draw l
random combinations of r
elements from t
.您想从
t
中绘制l
个r
元素的随机组合。 You require that the l
combinations are different.您要求
l
组合不同。 Until now my take is the following (inspired by the iter tools recipes):到目前为止,我的看法如下(受 iter 工具食谱的启发):
def random_combinations(iterable,r,size):
n=len(tuple(iterable))
combinations=[None]*size
f=mt.factorial # Factorial function
nCr=f(n)//(f(r)*f(n-r)) # nCr
iteration_limit=10*nCr # Limit of iterations 10 times nCr
repeated_combinations=0 # Counter of repeated combinations
i=0 # Storage index
combinations[i]=tuple(sorted(rn.sample(xrange(n),r))) # First combination
i+=1 # Advance the counting
while i < size: # Loop for subsequent samples
indices=tuple(sorted(rn.sample(xrange(n),r)))
test=[ combinations[j] for j in range(i) ]
test.append(indices)
test=len(list(set(test)))
if test == i+1: # Test of duplicity
repeated_combinations=0
combinations[i]=indices
i+=1
else:
repeated_combinations+=1
if repeated_combinations == iteration_limit: # Test for iteration limit
break
return combinations
Is there another way more efficient to do this?有没有另一种更有效的方法来做到这一点? I ask this because I will be drawing several combinations from iterables that are huge (over 100 elements).
我问这个是因为我将从巨大的可迭代对象(超过 100 个元素)中绘制几种组合。
After selecting the most helpful answer, I confirmed that the problem with that solution was the iteration to filter the combinations that are not selected.在选择了最有帮助的答案后,我确认该解决方案的问题在于迭代过滤未选择的组合。 However, this inspired me to look for a faster way to filter them.
然而,这激发了我寻找一种更快的方法来过滤它们。 I end up using sets in the following way
我最终以下列方式使用集合
import itertools as it
import math as mt
import random as rn
def random_combinations(iterable,r,l):
"""
Calculates random combinations from an iterable and returns a light-weight
iterator.
Parameters
----------
iterable : sequence, list, iterator or ndarray
Iterable from which draw the combinations.
r : int
Size of the combinations.
l : int
Number of drawn combinations.
Returns
-------
combinations : iterator or tuples
Random combinations of the elements of the iterable. Iterator object.
"""
pool=tuple(iterable)
n=len(pool)
n_combinations=nCr(n,r) # nCr
if l > n_combinations: # Constrain l to be lesser or equal to nCr
l=n_combinations
combinations=set() # Set storage that discards repeated combinations
while len(combinations) < l:
combinations.add(tuple(sorted(rn.sample(zrange(n),r))))
def filtro(combi): # Index combinations to actual values of the iterable
return tuple(pool[index] for index in combi)
combinations=it.imap(filtro,combinations) # Light-weight iterator
return combinations
The set automatically takes care of repeated combinations.该集合会自动处理重复的组合。
Rather than generating all the combinations, then choosing one of them (which will grow much much faster than n
), instead do the following:与其生成所有组合,然后选择其中一个(其增长速度将比
n
快得多),不如执行以下操作:
r
items in order (an implementation in pseudocode is below).r
项目的样本(伪代码实现如下)。l
samples were stored this way.l
个样本以这种方式存储。 The pseudocode referred to is below.所指的伪代码如下。 See also L. Devroye's Non-Uniform Random Variate Generation , p.
另请参见 L. Devroye 的非均匀随机变量生成,p。 620.
620。
METHOD RandomRItemsInOrder(t, r)
n = size(t)
// Special case if r is 1
if r==1: return [t[RNDINTEXC(n)]]
i = 0
kk = r
ret = NewList()
while i < n and size(ret) < r
u = RNDINTEXC(n - i)
if u <= kk
AddItem(ret, t[i])
kk = kk - 1
end
i = i + 1
end
return ret
END METHOD
Instead of the pseudocode above, you can also generate a random sample via reservoir sampling, but then it won't be trivial to maintain a canonical order for the sample.除了上面的伪代码,您还可以通过水库采样生成随机样本,但是维护样本的规范顺序并非易事。
You could do select l
random indices in the C(n,r) sequence and return the combinations corresponding to these selected random indices.您可以在C(n,r)序列中执行 select
l
随机索引,并返回与这些选定随机索引相对应的组合。
import itertools
import random
import math
def random_combinations(iterable, r, l):
copy1, copy2 = itertools.tee(iterable)
num_combos = math.comb(sum(1 for _ in copy1), r)
rand_indices = set(random.sample(range(num_combos), l))
combos = itertools.combinations(copy2, r)
selected_combos = (x[1] for x in enumerate(combos) if x[0] in rand_indices)
return list(itertools.islice(selected_combos, l))
To avoid iterating thru the combinations, we need a mechanism to skip over combinations.为了避免遍历组合,我们需要一种跳过组合的机制。 I am not sure such a mechanism exists in Python's standard library.
我不确定 Python 的标准库中是否存在这种机制。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.