简体   繁体   中英

Python - itertools.product without using element more than once

I have a list of lists, and I would like to duplicate the effect of itertools.product() without using any element more than once.

>>> list = [['A', 'B'], ['C', 'D'], ['A', 'B']]
>>> [''.join(e) for e in itertools.product(*list)]
['ACA', 'ACB', 'ADA', 'ADB', 'BCA', 'BCB', 'BDA', 'BDB']
>>> # Desired output: ['ACB', 'ADB', 'BCA', 'BDA']

The list I need to use this on is too large to compute itertools.product and remove the unneeded elements. (25 billion permutations from itertools.product , while my desired output only has ~500,000). Preferably, an answer will be iterable.

Edit: I know "product" is the wrong term for what I need, but I'm struggling to find the word I'm looking for.

Edit2: This is the list that I desire to perform this operation on:

[['A', 'H', 'L'], ['X', 'B', 'I'], ['Q', 'C', 'V'], ['D', 'N'], ['E', 'F'], ['E', 'F'], ['G'], ['A', 'H', 'L'], ['X', 'B', 'I'], ['W', 'U', 'J', 'K', 'M'], ['W', 'U', 'J', 'K', 'M'], ['A', 'H', 'L'], ['W', 'U', 'J', 'K', 'M'], ['D', 'N'], ['P', 'O', 'T'], ['P', 'O', 'T'], ['Q', 'C', 'V'], ['R'], ['S'], ['P', 'O', 'T'], ['W', 'U', 'J', 'K', 'M'], ['Q', 'C', 'V'], ['W', 'U', 'J', 'K', 'M'], ['X', 'B', 'I']]
test1 = [['A', 'B'], ['C', 'D'], ['A', 'B']]                                                                                           
test2 = [['A', 'H', 'L'], ['X', 'B', 'I'], ['Q', 'C', 'V'], ['D', 'N'], ['E', 'F'], ['E', 'F'], ['G'], ['A', 'H', 'L'],
         ['X', 'B', 'I'], ['W', 'U', 'J', 'K', 'M'], ['W', 'U', 'J', 'K', 'M'], ['A', 'H', 'L'],
         ['W', 'U', 'J', 'K', 'M'], ['D', 'N'], ['P', 'O', 'T'], ['P', 'O', 'T'], ['Q', 'C', 'V'], ['R'], ['S'],
         ['P', 'O', 'T'], ['W', 'U', 'J', 'K', 'M'], ['Q', 'C', 'V'], ['W', 'U', 'J', 'K', 'M'], ['X', 'B', 'I']]


def prod(curr, *others):
    if not others:
        for x in curr:
            yield {x} # (x,) for tuples
    else:
        for o in prod(*others):
            for c in curr:
                if c not in o:
                    yield {c, *o} # (c, *o) for tuples

print([''.join(x) for x in prod(*test1)])
# ['ABC', 'ABD', 'ABC', 'ABD'] 
print(sum(1 for x in prod(*test2)))
# 622080

The longer input takes about five seconds to run on my machine. I use set s to pass values around because they are much more efficent than tuples or lists when it comes to membership checks. If necessary, you can use tuples, it will just be slower.

Some questions to think about: does order matter? What do you want to happen when we can't use an item from the current list (because they've all already been used)?

A simple stack-based implementation:

def product1(l): return product1_(l,0,[])
def product1_(l,i,buf):
  if i==len(l): yield buf
  else:
    for x in l[i]:
      if x not in buf:
        buf.append(x)
        yield from product1_(l,i+1,buf)
        buf.pop()

This is a bit slower than Patrick Haugh's answer (I get 18 s for your test case), but it gives the results in a predictable order.

Note that you have to process the values as it generates "them", since they're all the same list buf ; you could write yield tuple(buf) or yield "".join(buf) to generate separate "cooked" values (at a cost of less than one additional second).

If the values are letters, you could use a "bitmask" list to implement the collision test, which reduces the time to about 13 s (but using a set is just as fast). Other possible optimizations include processing lists with fewer eligible elements first, to reduce backtracking; this can get this case down to 11 s.

Your specific case has an interesting property. If we arrange it in a counter, we see that every list occurs as many times as its entries:

Counter({('A', 'H', 'L'): 3,
         ('D', 'N'): 2,
         ('E', 'F'): 2,
         ('G',): 1,
         ('P', 'O', 'T'): 3,
         ('Q', 'C', 'V'): 3,
         ('R',): 1,
         ('S',): 1,
         ('W', 'U', 'J', 'K', 'M'): 5,
         ('X', 'B', 'I'): 3})

In other words, ignoring order, the sequences you want are the cartesian products of permutations of your lists. Suppose your list is called l . Then we can assemble a list of all the permutations of the sublists and take their product:

c = set(tuple(i) for i in l)
permutations = [list(itertools.permutations(i)) for i in c]
permutation_products = itertools.products(*permutations)

An element of permutation_products looks something like:

(('W', 'U', 'J', 'K', 'M'),
 ('E', 'F'),
 ('X', 'B', 'I'),
 ('Q', 'C', 'V'),
 ('P', 'O', 'T'),
 ('D', 'N'),
 ('G',),
 ('S',),
 ('R',),
 ('A', 'L', 'H'))

We have to get it back into the right order. Suppose our permutation is called perm . For each sublist of your list, we have to locate the correct element of perm and then take the next letter in the permutation. We can do this by making a dictionary:

perm_dict = {frozenset(p): list(p) for p in perm}

Then, to construct a single permutation, we have:

s = "".join([perm_dict[frozenset(i)].pop() for i in l])

We can combine all this into a generator:

def permute_list(l):
    c = set(tuple(i) for i in l)
    permutations = [list(itertools.permutations(i)) for i in c]
    permutation_products = itertools.product(*permutations)
    for perm in permutation_products:
        perm_dict = {frozenset(p): list(p) for p in perm}
        yield "".join([perm_dict[frozenset(i)].pop() for i in l])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM