简体   繁体   English

Python 无重复组合

[英]Python combinations without repetitions

I have a list of numbers and I want to make combinations from it.我有一个数字列表,我想从中进行组合。 If I have list:如果我有清单:

t = [2,2,2,2,4]
c = list(itertools.combinations(t, 4))

The result is:结果是:

(2, 2, 2, 2)
(2, 2, 2, 4)
(2, 2, 2, 4)
(2, 2, 2, 4)
(2, 2, 2, 4)

but I want to get:但我想得到:

(2, 2, 2, 2)
(2, 2, 2, 4)

Is it possible to eliminate duplicates except making new list and going through first list?除了制作新列表并通过第一个列表之外,是否可以消除重复项?

As Donkey Kong points to set, You can get the unique values in a list by converting the list to a set :由于大金刚指向设置,您可以通过将列表转换为集合来获取列表中的唯一值:

t = [2,2,2,2,4]
c = list(itertools.combinations(t, 4))
unq = set(c)
print(unq)

And the result will be:结果将是:

{(2, 2, 2, 4), (2, 2, 2, 2)}

If you want to use it as a list, you can convert it back by doing :如果要将其用作列表,可以通过执行以下操作将其转换回来:

result = list(unq)

Alternative and more clean,comprehensive way will be :另一种更干净、更全面的方法是:

t = [2,2,2,2,4]
c = set(itertools.combinations(t, 4))

I know this is late but I want to add a point.我知道这已经晚了,但我想补充一点。

set(itertools.combinations(t, 4)) would do a fine job for most cases, but it still iterates all repetitive combinations internally and so it can be computationally heavy. set(itertools.combinations(t, 4))在大多数情况下会做得很好,但它仍然在内部迭代所有重复的组合,因此计算量可能很大。 This is especially the case if there aren't many actual unique combinations.如果没有很多实际的独特组合,情况尤其如此。

This one iterates only unique combinations:这个只迭代独特的组合:

from itertools import chain,repeat,count,islice
from collections import Counter

def combinations_without_repetition(r, iterable=None, values=None, counts=None):
    if iterable:
        values, counts = zip(*Counter(iterable).items())

    f = lambda i,c: chain.from_iterable(map(repeat, i, c))
    n = len(counts)
    indices = list(islice(f(count(),counts), r))
    if len(indices) < r:
        return
    while True:
        yield tuple(values[i] for i in indices)
        for i,j in zip(reversed(range(r)), f(reversed(range(n)), reversed(counts))):
            if indices[i] != j:
                break
        else:
            return
        j = indices[i]+1
        for i,j in zip(range(i,r), f(count(j), counts[j:])):
            indices[i] = j

Usage:用法:

>>> t = [2,2,2,2,4]
# elements in t must be hashable
>>> list(combinations_without_repetition(4, iterable=t)) 
[(2, 2, 2, 2), (2, 2, 2, 4)]

# You can pass values and counts separately. For this usage, values don't need to be hashable
# Say you have ['a','b','b','c','c','c'], then since there is 1 of 'a', 2 of 'b', and 3 of 'c', you can do as follows:
>>> list(combinations_without_repetition(3, values=['a','b','c'], counts=[1,2,3]))
[('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'c'), ('b', 'b', 'c'), ('b', 'c', 'c'), ('c', 'c', 'c')]

# combinations_without_repetition() is a generator (and thus an iterator)
# so you can iterate it
>>> for comb in combinations_without_repetition(4, t):
...     print(sum(comb))
...
8   # 2+2+2+2
10  # 2+2+2+4

Note that itertools.combinations() is implemented in C, which means it is much faster than my python script for most cases.请注意, itertools.combinations()是用 C 实现的,这意味着在大多数情况下它比我的 python 脚本快得多。 This code works better than set(itertools.combinations()) method only when there are A LOT MORE repetitive combinations than unique combinations.仅当重复组合多于唯一组合时,此代码才比set(itertools.combinations())方法更有效。

Technically, what you get are not actually duplicates, it's simply how itertools.combinations works, if you read the description in the linked page:从技术上讲,你得到的实际上并不是重复的,这只是itertools.combinations工作方式,如果你阅读链接页面中的描述:

itertools.combinations(iterable, r)

Return r length subsequences of elements from the input iterable.从输入迭代中返回元素的 r 个长度子序列。

Combinations are emitted in lexicographic sort order.组合按字典排序顺序发出。 So, if the input iterable is sorted, the combination tuples will be produced in sorted order.因此,如果输入可迭代对象已排序,则组合元组将按排序顺序生成。

Elements are treated as unique based on their position, not on their value .元素被视为唯一基于它们的位置,而不是它们的值 So if the input elements are unique, there will be no repeat values in each combination.因此,如果输入元素是唯一的,则每个组合中都不会出现重复值。

DEMO:演示:

>>> import itertools as it
>>> list(it.combinations([1,2,3,4,5], 4))
[(1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5), (1, 3, 4, 5), (2, 3, 4, 5)]

So, just as posted on the previous answer, set() will give you the unique values you want:因此,正如在上一个答案中发布的那样, set()将为您提供所需的唯一值:

>>> set(it.combinations(t, 4))
{(2, 2, 2, 4), (2, 2, 2, 2)}

This can now be done using the package more-itertools which, as of version 8.7, has a function called distinct_combinations to achieve this.现在可以使用 package more-itertools来完成,从 8.7 版开始,它有一个名为distinct_combinations的 function 来实现这一点。

>>> from itertools import combinations
>>> t = [2,2,2,2,4]
>>> set(combinations(t, 4))
{(2, 2, 2, 2), (2, 2, 2, 4)}

>>> from more_itertools import distinct_combinations
>>> t = [2,2,2,2,4]
>>> list(distinct_combinations(t,4))
(2, 2, 2, 2), (2, 2, 2, 4)]

As far as I can tell with my very limited testing performance is similar to the function written by @hahho据我所知,我非常有限的测试性能类似于@hahho编写的 function

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM