简体   繁体   中英

How do I get all unique combinations and their multiplicities from a Python list?

I know that itertools has a method for generating combinations, as described here: Get unique combinations of elements from a python list . However, I'm looking for an iterator that gives unique combinations and their multiplicities.

Example: I have an expression that only depends on which combination of 2 elements I select from a list L = [2,1,2,2]. I need to sum the result for all combinations. What I want is an iterator that gives eg (([1,2], 3), ([2,2], 3)). That way, I can compute the expression for just the 2 unique combinations and multiply by 3, rather than computing for all 6 combinations, of which many give the same result.

You can combine itertools.combinations with collections.Counter .

import itertools
import collections  

L =  [2,1,2,2]
c = collections.Counter()
c.update(map(tuple, map(sorted, itertools.combinations(L, 2))))

c.items() then gives:

>>> c.items()
[((1, 2), 3), ((2, 2), 3)]

To break it down, itertools.combinations(L, 2) gives all the ordered combinations of L of length 2. We then use sorted to make them comparable since collections.Counter will use hashing and equality to count. Finally, because list objects are not hashable, we convert them to tuple objects which are.

In the end, my code took too long to explicitly count every possible combination, so I came up with a way to find only the unique ones and then analytically compute their multiplicities. It's based on the following idea: Call the input list A and the number of elements in each subset k. First sort the list and initialize k pointers to the first k elements of A. Then repeatedly attempt to move the rightmost pointer to the right until it encounters a new value. Every time another pointer than the rightmost is moved, all pointers to its right are set to its neighbors, eg if pointer 1 is moved to index 6, pointer 2 is moved to index 7 and so on.

The multiplicity of any combination C can be found by multiplying the binomial coefficients (N_i, m_i) where N_i and m_i are the number of occurrences of element i in A and C, respectively.

Below is an implementation of a brute force approach, and a method which exploits uniqueness.

This figure compares the runtime of brute force counting vs. my approach. Counting becomes infeasible when the input list has about 20 elements. 运行时比较

# -*- coding: utf-8 -*-
from __future__ import division

from itertools import combinations
from collections import Counter
from operator import mul
import numpy as np
from scipy.special import binom

def brute(A, k):
    '''This works, but counts every combination.'''
    A_sorted = sorted(A)
    d = {}
    for comb in combinations(A_sorted, k):
        try:
            d[comb] += 1
        except KeyError:
            d[comb] = 1
        #
    return d


def get_unique_unordered_combinations(A, k):
        '''Returns all unique unordered subsets with size k of input array.'''
    # If we're picking zero elements, we can only do it in one way. Duh.
    if k < 0:
        raise ValueError("k must be non-negative")

    if k == 0 or k > len(A):
        yield ()
        return  # Done. There's only one way to select zero elements :)

    # Sorted version of input list
    A = np.array(sorted(A))
    # Indices of currently selected combination
    inds = range(k)
    # Pointer to the index we're currently trying to increment
    lastptr = len(inds) - 1

    # Construct list of indices of next element of A different from current.
    # e.g. [1,1,1,2,2,7] -> [3,3,3,5,5,6] (6 falls off list)
    skipper = [len(A) for a in A]
    prevind = 0
    for i in xrange(1, len(A)):
        if A[i] != A[prevind]:
            for j in xrange(prevind, i):
                skipper[j] = i
            prevind = i
        #

    while True:
        # Yield current combination from current indices
        comb = tuple(A[inds])
        yield comb

        # Try attempt to change indices, starting with rightmost index
        for p in xrange(lastptr, -1 , -1):
            nextind = skipper[inds[p]]
            #print "Trying to increment index %d to %d"  % (inds[p], nextind)
            if nextind + (lastptr - p) >= len(A):
                continue  # No room to move this pointer. Try the next
            #print "great success"
            for i in xrange(lastptr-p+1):
                inds[p+i] = nextind + i
            break
        else:
            # We've exhausted all possibilities, so there are no combs left
            return

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM