[英]How do I get all unique combinations and their multiplicities from a Python list?
I know that itertools has a method for generating combinations, as described here: Get unique combinations of elements from a python list . 我知道itertools有一种生成组合的方法,如下所述: 从python list中获取元素的唯一组合 。 However, I'm looking for an iterator that gives unique combinations and their multiplicities. 不过,我正在寻找一个迭代器,让独特的组合和它们的重数。
Example: I have an expression that only depends on which combination of 2 elements I select from a list L = [2,1,2,2]. 示例:我有一个表达式仅取决于我从列表L = [2,1,2,2]中选择2个元素的哪种组合。 I need to sum the result for all combinations. 我需要对所有组合的结果求和。 What I want is an iterator that gives eg (([1,2], 3), ([2,2], 3)). 我想要的是一个给出(([[1,2],3),([2,2],3))的迭代器。 That way, I can compute the expression for just the 2 unique combinations and multiply by 3, rather than computing for all 6 combinations, of which many give the same result. 这样,我可以仅针对2个唯一组合计算表达式,然后乘以3,而不是针对所有6种组合进行计算,其中许多组合给出相同的结果。
You can combine itertools.combinations
with collections.Counter
. 您可以将itertools.combinations
与collections.Counter
结合使用。
import itertools
import collections
L = [2,1,2,2]
c = collections.Counter()
c.update(map(tuple, map(sorted, itertools.combinations(L, 2))))
c.items()
then gives: c.items()
然后给出:
>>> c.items()
[((1, 2), 3), ((2, 2), 3)]
To break it down, itertools.combinations(L, 2)
gives all the ordered combinations of L
of length 2. We then use sorted
to make them comparable since collections.Counter
will use hashing and equality to count. 为了进行分解, itertools.combinations(L, 2)
给出长度为2的所有L
的有序组合。然后使用sorted
使它们具有可比性,因为collections.Counter
将使用哈希和相等性进行计数。 Finally, because list
objects are not hashable, we convert them to tuple
objects which are. 最后,由于list
对象不可哈希,因此我们将其转换为tuple
对象。
In the end, my code took too long to explicitly count every possible combination, so I came up with a way to find only the unique ones and then analytically compute their multiplicities. 最后,我的代码花费了太多时间,无法显式地计算每种可能的组合,因此我想出了一种方法,仅找到唯一的组合,然后分析计算它们的多重性。 It's based on the following idea: Call the input list A and the number of elements in each subset k. 它基于以下思想:调用输入列表A和每个子集k中的元素数。 First sort the list and initialize k pointers to the first k elements of A. Then repeatedly attempt to move the rightmost pointer to the right until it encounters a new value. 首先对列表进行排序,并初始化指向A的前k个元素的k个指针。然后反复尝试将最右边的指针向右移动,直到遇到新值为止。 Every time another pointer than the rightmost is moved, all pointers to its right are set to its neighbors, eg if pointer 1 is moved to index 6, pointer 2 is moved to index 7 and so on. 每次移动距离最右边的另一个指针时,指向右边的所有指针都将设置为它的邻居,例如,如果指针1移动到索引6,指针2移动到索引7,依此类推。
The multiplicity of any combination C can be found by multiplying the binomial coefficients (N_i, m_i) where N_i and m_i are the number of occurrences of element i in A and C, respectively. 任何组合C的多重性都可以通过将二项式系数(N_i,m_i)相乘来找到,其中N_i和m_i分别是元素i在A和C中出现的次数。
Below is an implementation of a brute force approach, and a method which exploits uniqueness. 下面是蛮力方法的一种实现,以及一种利用唯一性的方法。
This figure compares the runtime of brute force counting vs. my approach. 该图将蛮力计数的运行时间与我的方法进行了比较。 Counting becomes infeasible when the input list has about 20 elements. 当输入列表包含约20个元素时,计数将变得不可行。
# -*- coding: utf-8 -*-
from __future__ import division
from itertools import combinations
from collections import Counter
from operator import mul
import numpy as np
from scipy.special import binom
def brute(A, k):
'''This works, but counts every combination.'''
A_sorted = sorted(A)
d = {}
for comb in combinations(A_sorted, k):
try:
d[comb] += 1
except KeyError:
d[comb] = 1
#
return d
def get_unique_unordered_combinations(A, k):
'''Returns all unique unordered subsets with size k of input array.'''
# If we're picking zero elements, we can only do it in one way. Duh.
if k < 0:
raise ValueError("k must be non-negative")
if k == 0 or k > len(A):
yield ()
return # Done. There's only one way to select zero elements :)
# Sorted version of input list
A = np.array(sorted(A))
# Indices of currently selected combination
inds = range(k)
# Pointer to the index we're currently trying to increment
lastptr = len(inds) - 1
# Construct list of indices of next element of A different from current.
# e.g. [1,1,1,2,2,7] -> [3,3,3,5,5,6] (6 falls off list)
skipper = [len(A) for a in A]
prevind = 0
for i in xrange(1, len(A)):
if A[i] != A[prevind]:
for j in xrange(prevind, i):
skipper[j] = i
prevind = i
#
while True:
# Yield current combination from current indices
comb = tuple(A[inds])
yield comb
# Try attempt to change indices, starting with rightmost index
for p in xrange(lastptr, -1 , -1):
nextind = skipper[inds[p]]
#print "Trying to increment index %d to %d" % (inds[p], nextind)
if nextind + (lastptr - p) >= len(A):
continue # No room to move this pointer. Try the next
#print "great success"
for i in xrange(lastptr-p+1):
inds[p+i] = nextind + i
break
else:
# We've exhausted all possibilities, so there are no combs left
return
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.