简体   繁体   中英

Python combinations without repetitions

I have a list of numbers and I want to make combinations from it. If I have list:

t = [2,2,2,2,4]
c = list(itertools.combinations(t, 4))

The result is:

(2, 2, 2, 2)
(2, 2, 2, 4)
(2, 2, 2, 4)
(2, 2, 2, 4)
(2, 2, 2, 4)

but I want to get:

(2, 2, 2, 2)
(2, 2, 2, 4)

Is it possible to eliminate duplicates except making new list and going through first list?

As Donkey Kong points to set, You can get the unique values in a list by converting the list to a set :

t = [2,2,2,2,4]
c = list(itertools.combinations(t, 4))
unq = set(c)
print(unq)

And the result will be:

{(2, 2, 2, 4), (2, 2, 2, 2)}

If you want to use it as a list, you can convert it back by doing :

result = list(unq)

Alternative and more clean,comprehensive way will be :

t = [2,2,2,2,4]
c = set(itertools.combinations(t, 4))

I know this is late but I want to add a point.

set(itertools.combinations(t, 4)) would do a fine job for most cases, but it still iterates all repetitive combinations internally and so it can be computationally heavy. This is especially the case if there aren't many actual unique combinations.

This one iterates only unique combinations:

from itertools import chain,repeat,count,islice
from collections import Counter

def combinations_without_repetition(r, iterable=None, values=None, counts=None):
    if iterable:
        values, counts = zip(*Counter(iterable).items())

    f = lambda i,c: chain.from_iterable(map(repeat, i, c))
    n = len(counts)
    indices = list(islice(f(count(),counts), r))
    if len(indices) < r:
        return
    while True:
        yield tuple(values[i] for i in indices)
        for i,j in zip(reversed(range(r)), f(reversed(range(n)), reversed(counts))):
            if indices[i] != j:
                break
        else:
            return
        j = indices[i]+1
        for i,j in zip(range(i,r), f(count(j), counts[j:])):
            indices[i] = j

Usage:

>>> t = [2,2,2,2,4]
# elements in t must be hashable
>>> list(combinations_without_repetition(4, iterable=t)) 
[(2, 2, 2, 2), (2, 2, 2, 4)]

# You can pass values and counts separately. For this usage, values don't need to be hashable
# Say you have ['a','b','b','c','c','c'], then since there is 1 of 'a', 2 of 'b', and 3 of 'c', you can do as follows:
>>> list(combinations_without_repetition(3, values=['a','b','c'], counts=[1,2,3]))
[('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'c'), ('b', 'b', 'c'), ('b', 'c', 'c'), ('c', 'c', 'c')]

# combinations_without_repetition() is a generator (and thus an iterator)
# so you can iterate it
>>> for comb in combinations_without_repetition(4, t):
...     print(sum(comb))
...
8   # 2+2+2+2
10  # 2+2+2+4

Note that itertools.combinations() is implemented in C, which means it is much faster than my python script for most cases. This code works better than set(itertools.combinations()) method only when there are A LOT MORE repetitive combinations than unique combinations.

Technically, what you get are not actually duplicates, it's simply how itertools.combinations works, if you read the description in the linked page:

itertools.combinations(iterable, r)

Return r length subsequences of elements from the input iterable.

Combinations are emitted in lexicographic sort order. So, if the input iterable is sorted, the combination tuples will be produced in sorted order.

Elements are treated as unique based on their position, not on their value . So if the input elements are unique, there will be no repeat values in each combination.

DEMO:

>>> import itertools as it
>>> list(it.combinations([1,2,3,4,5], 4))
[(1, 2, 3, 4), (1, 2, 3, 5), (1, 2, 4, 5), (1, 3, 4, 5), (2, 3, 4, 5)]

So, just as posted on the previous answer, set() will give you the unique values you want:

>>> set(it.combinations(t, 4))
{(2, 2, 2, 4), (2, 2, 2, 2)}

This can now be done using the package more-itertools which, as of version 8.7, has a function called distinct_combinations to achieve this.

>>> from itertools import combinations
>>> t = [2,2,2,2,4]
>>> set(combinations(t, 4))
{(2, 2, 2, 2), (2, 2, 2, 4)}

>>> from more_itertools import distinct_combinations
>>> t = [2,2,2,2,4]
>>> list(distinct_combinations(t,4))
(2, 2, 2, 2), (2, 2, 2, 4)]

As far as I can tell with my very limited testing performance is similar to the function written by @hahho

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM