简体   繁体   English

如何从Python列表中获得所有唯一的组合及其多重性?

[英]How do I get all unique combinations and their multiplicities from a Python list?

I know that itertools has a method for generating combinations, as described here: Get unique combinations of elements from a python list . 我知道itertools有一种生成组合的方法,如下所述: 从python list中获取元素的唯一组合 However, I'm looking for an iterator that gives unique combinations and their multiplicities. 不过,我正在寻找一个迭代器,让独特的组合它们的重数。

Example: I have an expression that only depends on which combination of 2 elements I select from a list L = [2,1,2,2]. 示例:我有一个表达式仅取决于我从列表L = [2,1,2,2]中选择2个元素的哪种组合。 I need to sum the result for all combinations. 我需要对所有组合的结果求和。 What I want is an iterator that gives eg (([1,2], 3), ([2,2], 3)). 我想要的是一个给出(([[1,2],3),([2,2],3))的迭代器。 That way, I can compute the expression for just the 2 unique combinations and multiply by 3, rather than computing for all 6 combinations, of which many give the same result. 这样,我可以仅针对2个唯一组合计算表达式,然后乘以3,而不是针对所有6种组合进行计算,其中许多组合给出相同的结果。

You can combine itertools.combinations with collections.Counter . 您可以将itertools.combinationscollections.Counter结合使用。

import itertools
import collections  

L =  [2,1,2,2]
c = collections.Counter()
c.update(map(tuple, map(sorted, itertools.combinations(L, 2))))

c.items() then gives: c.items()然后给出:

>>> c.items()
[((1, 2), 3), ((2, 2), 3)]

To break it down, itertools.combinations(L, 2) gives all the ordered combinations of L of length 2. We then use sorted to make them comparable since collections.Counter will use hashing and equality to count. 为了进行分解, itertools.combinations(L, 2)给出长度为2的所有L的有序组合。然后使用sorted使它们具有可比性,因为collections.Counter将使用哈希和相等性进行计数。 Finally, because list objects are not hashable, we convert them to tuple objects which are. 最后,由于list对象不可哈希,因此我们将其转换为tuple对象。

In the end, my code took too long to explicitly count every possible combination, so I came up with a way to find only the unique ones and then analytically compute their multiplicities. 最后,我的代码花费了太多时间,无法显式地计算每种可能的组合,因此我想出了一种方法,仅找到唯一的组合,然后分析计算它们的多重性。 It's based on the following idea: Call the input list A and the number of elements in each subset k. 它基于以下思想:调用输入列表A和每个子集k中的元素数。 First sort the list and initialize k pointers to the first k elements of A. Then repeatedly attempt to move the rightmost pointer to the right until it encounters a new value. 首先对列表进行排序,并初始化指向A的前k个元素的k个指针。然后反复尝试将最右边的指针向右移动,直到遇到新值为止。 Every time another pointer than the rightmost is moved, all pointers to its right are set to its neighbors, eg if pointer 1 is moved to index 6, pointer 2 is moved to index 7 and so on. 每次移动距离最右边的另一个指针时,指向右边的所有指针都将设置为它的邻居,例如,如果指针1移动到索引6,指针2移动到索引7,依此类推。

The multiplicity of any combination C can be found by multiplying the binomial coefficients (N_i, m_i) where N_i and m_i are the number of occurrences of element i in A and C, respectively. 任何组合C的多重性都可以通过将二项式系数(N_i,m_i)相乘来找到,其中N_i和m_i分别是元素i在A和C中出现的次数。

Below is an implementation of a brute force approach, and a method which exploits uniqueness. 下面是蛮力方法的一种实现,以及一种利用唯一性的方法。

This figure compares the runtime of brute force counting vs. my approach. 该图将蛮力计数的运行时间与我的方法进行了比较。 Counting becomes infeasible when the input list has about 20 elements. 当输入列表包含约20个元素时,计数将变得不可行。 运行时比较

# -*- coding: utf-8 -*-
from __future__ import division

from itertools import combinations
from collections import Counter
from operator import mul
import numpy as np
from scipy.special import binom

def brute(A, k):
    '''This works, but counts every combination.'''
    A_sorted = sorted(A)
    d = {}
    for comb in combinations(A_sorted, k):
        try:
            d[comb] += 1
        except KeyError:
            d[comb] = 1
        #
    return d


def get_unique_unordered_combinations(A, k):
        '''Returns all unique unordered subsets with size k of input array.'''
    # If we're picking zero elements, we can only do it in one way. Duh.
    if k < 0:
        raise ValueError("k must be non-negative")

    if k == 0 or k > len(A):
        yield ()
        return  # Done. There's only one way to select zero elements :)

    # Sorted version of input list
    A = np.array(sorted(A))
    # Indices of currently selected combination
    inds = range(k)
    # Pointer to the index we're currently trying to increment
    lastptr = len(inds) - 1

    # Construct list of indices of next element of A different from current.
    # e.g. [1,1,1,2,2,7] -> [3,3,3,5,5,6] (6 falls off list)
    skipper = [len(A) for a in A]
    prevind = 0
    for i in xrange(1, len(A)):
        if A[i] != A[prevind]:
            for j in xrange(prevind, i):
                skipper[j] = i
            prevind = i
        #

    while True:
        # Yield current combination from current indices
        comb = tuple(A[inds])
        yield comb

        # Try attempt to change indices, starting with rightmost index
        for p in xrange(lastptr, -1 , -1):
            nextind = skipper[inds[p]]
            #print "Trying to increment index %d to %d"  % (inds[p], nextind)
            if nextind + (lastptr - p) >= len(A):
                continue  # No room to move this pointer. Try the next
            #print "great success"
            for i in xrange(lastptr-p+1):
                inds[p+i] = nextind + i
            break
        else:
            # We've exhausted all possibilities, so there are no combs left
            return

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM