简体   繁体   中英

Random combinations without duplicates in Python

Suppose you have an iterable t of size n . You want to draw l random combinations of r elements from t . You require that the l combinations are different. Until now my take is the following (inspired by the iter tools recipes):

def random_combinations(iterable,r,size):
 n=len(tuple(iterable))
 combinations=[None]*size
 
 f=mt.factorial                                            # Factorial function
 nCr=f(n)//(f(r)*f(n-r))                                   # nCr
 iteration_limit=10*nCr                      # Limit of iterations 10 times nCr
 
 repeated_combinations=0                     # Counter of repeated combinations
 i=0                                         # Storage index
 
 combinations[i]=tuple(sorted(rn.sample(xrange(n),r)))      # First combination
 i+=1                                                    # Advance the counting
 
 while i < size:                                  # Loop for subsequent samples
  indices=tuple(sorted(rn.sample(xrange(n),r)))
  test=[ combinations[j] for j in range(i) ]
  test.append(indices)
  test=len(list(set(test)))
  if test == i+1:                                           # Test of duplicity
   repeated_combinations=0
   combinations[i]=indices
   i+=1
  else:
   repeated_combinations+=1
  if repeated_combinations == iteration_limit:       # Test for iteration limit
   break
 return combinations

Is there another way more efficient to do this? I ask this because I will be drawing several combinations from iterables that are huge (over 100 elements).


After selecting the most helpful answer, I confirmed that the problem with that solution was the iteration to filter the combinations that are not selected. However, this inspired me to look for a faster way to filter them. I end up using sets in the following way

import itertools as it
import math as mt
import random as rn

def random_combinations(iterable,r,l):
 """
 Calculates random combinations from an iterable and returns a light-weight
 iterator.
 
 Parameters
 ----------
 
 iterable : sequence, list, iterator or ndarray
     Iterable from which draw the combinations.
 r : int
     Size of the combinations.
 l : int
     Number of drawn combinations.
 
 Returns
 -------
 
 combinations : iterator or tuples
     Random combinations of the elements of the iterable. Iterator object.
 
 """
 pool=tuple(iterable)
 n=len(pool)
 
 n_combinations=nCr(n,r)                                                  # nCr
 if l > n_combinations:              # Constrain l to be lesser or equal to nCr
  l=n_combinations
 
 combinations=set()           # Set storage that discards repeated combinations
 while len(combinations) < l:
  combinations.add(tuple(sorted(rn.sample(zrange(n),r))))
 
 def filtro(combi):       # Index combinations to actual values of the iterable
  return tuple(pool[index] for index in combi)
 
 combinations=it.imap(filtro,combinations)              # Light-weight iterator
 
 return combinations

The set automatically takes care of repeated combinations.

Rather than generating all the combinations, then choosing one of them (which will grow much much faster than n ), instead do the following:

  • Create an empty hash table or set.
  • Choose a sample of r items in order (an implementation in pseudocode is below).
  • Check whether the sample is already present (eg, as a key in the hash table or a value in the set). If not, store that sample (eg, as a key in the hash table or a value in the set).
  • Continue until l samples were stored this way.

The pseudocode referred to is below. See also L. Devroye's Non-Uniform Random Variate Generation , p. 620.

METHOD RandomRItemsInOrder(t, r)
  n = size(t)
  // Special case if r is 1
  if r==1: return [t[RNDINTEXC(n)]]
  i = 0
  kk = r
  ret = NewList()
  while i < n and size(ret) < r
    u = RNDINTEXC(n - i)
    if u <= kk
      AddItem(ret, t[i])
      kk = kk - 1
    end
    i = i + 1
  end
  return ret
END METHOD

Instead of the pseudocode above, you can also generate a random sample via reservoir sampling, but then it won't be trivial to maintain a canonical order for the sample.

You could do select l random indices in the C(n,r) sequence and return the combinations corresponding to these selected random indices.

import itertools
import random
import math

def random_combinations(iterable, r, l):
    copy1, copy2 = itertools.tee(iterable)
    num_combos = math.comb(sum(1 for _ in copy1), r)
    rand_indices = set(random.sample(range(num_combos), l))
    combos = itertools.combinations(copy2, r)
    selected_combos = (x[1] for x in  enumerate(combos) if x[0] in rand_indices)
    return list(itertools.islice(selected_combos, l))

To avoid iterating thru the combinations, we need a mechanism to skip over combinations. I am not sure such a mechanism exists in Python's standard library.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM