简体   繁体   中英

Find in python combinations of mutually exclusive sets from a list's elements

In a project I am currently working on I have implemented about 80% of what I want my program to do and I am very happy with the results.

In the remaining 20% I am faced with a problem which puzzles me a bit on how to solve. Here it is:

I have come up with a list of lists which contain several numbers (arbitrary length) For example:

listElement[0] = [1, 2, 3]
listElement[1] = [3, 6, 8]
listElement[2] = [4, 9]
listElement[4] = [6, 11]
listElement[n] = [x, y, z...]

where n could reach up to 40,000 or so.

Assuming each list element is a set of numbers (in the mathematical sense), what I would like to do is to derive all the combinations of mutually exclusive sets; that is, like the powerset of the above list elements, but with all non-disjoint-set elements excluded.

So, to continue the example with n=4, I would like to come up with a list that has the following combinations:

newlistElement[0] = [1, 2, 3]
newlistElement[1] = [3, 6, 8]
newlistElement[2] = [4, 9]
newlistElement[4] = [6, 11] 
newlistElement[5] = [[1, 2, 3], [4, 9]]
newlistElement[6] = [[1, 2, 3], [6, 11]]
newlistElement[7] = [[1, 2, 3], [4, 9], [6, 11]]
newlistElement[8] = [[3, 6, 8], [4, 9]]
newlistElement[9] = [[4, 9], [6, 11]

An invalid case, for example would be combination [[1, 2, 3], [3, 6, 8]] because 3 is common in two elements. Is there any elegant way to do this? I would be extremely grateful for any feedback.

I must also specify that I would not like to do the powerset function, because the initial list could have quite a large number of elements (as I said n could go up to 40000), and taking the powerset with so many elements would never finish.

I'd use a generator:

import itertools

def comb(seq):
   for n in range(1, len(seq)):
      for c in itertools.combinations(seq, n): # all combinations of length n
         if len(set.union(*map(set, c))) == sum(len(s) for s in c): # pairwise disjoint?
            yield list(c)

for c in comb([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]):
   print c

This produces:

[[1, 2, 3]]
[[3, 6, 8]]
[[4, 9]]
[[6, 11]]
[[1, 2, 3], [4, 9]]
[[1, 2, 3], [6, 11]]
[[3, 6, 8], [4, 9]]
[[4, 9], [6, 11]]
[[1, 2, 3], [4, 9], [6, 11]]

If you need to store the results in a single list:

print list(comb([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]))

The following is a recursive generator:

def comb(input, lst = [], lset = set()):
   if lst:
      yield lst
   for i, el in enumerate(input):
      if lset.isdisjoint(el):
         for out in comb(input[i+1:], lst + [el], lset | set(el)):
            yield out

for c in comb([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]):
   print c

This is likely to be a lot more efficient than the other solutions in situations where a lot of sets have common elements (of course in the worst case it still has to iterate over the 2**n elements of the powerset).

The method used in the program below is similar to a couple of previous answers in excluding not-disjoint sets and therefore usually not testing all combinations. It differs from previous answers by greedily excluding all the sets it can, as early as it can. This allows it to run several times faster than NPE's solution. Here is a time comparison of the two methods, using input data with 200, 400, ... 1000 size-6 sets having elements in the range 0 to 20:

Set size =   6,  Number max =  20   NPE method
  0.042s  Sizes: [200, 1534, 67]
  0.281s  Sizes: [400, 6257, 618]
  0.890s  Sizes: [600, 13908, 2043]
  2.097s  Sizes: [800, 24589, 4620]
  4.387s  Sizes: [1000, 39035, 9689]

Set size =   6,  Number max =  20   jwpat7 method
  0.041s  Sizes: [200, 1534, 67]
  0.077s  Sizes: [400, 6257, 618]
  0.167s  Sizes: [600, 13908, 2043]
  0.330s  Sizes: [800, 24589, 4620]
  0.590s  Sizes: [1000, 39035, 9689]

In the above data, the left column shows execution time in seconds. The lists of numbers show how many single, double, or triple unions occurred. Constants in the program specify data set sizes and characteristics.

#!/usr/bin/python
from random import sample, seed
import time
nsets,   ndelta,  ncount, setsize  = 200, 200, 5, 6
topnum, ranSeed, shoSets, shoUnion = 20, 1234, 0, 0
seed(ranSeed)
print 'Set size = {:3d},  Number max = {:3d}'.format(setsize, topnum)

for casenumber in range(ncount):
    t0 = time.time()
    sets, sizes, ssum = [], [0]*nsets, [0]*(nsets+1);
    for i in range(nsets):
        sets.append(set(sample(xrange(topnum), setsize)))

    if shoSets:
        print 'sets = {},  setSize = {},  top# = {},  seed = {}'.format(
            nsets, setsize, topnum, ranSeed)
        print 'Sets:'
        for s in sets: print s

    # Method by jwpat7
    def accrue(u, bset, csets):
        for i, c in enumerate(csets):
            y = u + [c]
            yield y
            boc = bset|c
            ts = [s for s in csets[i+1:] if boc.isdisjoint(s)]
            for v in accrue (y, boc, ts):
                yield v

    # Method by NPE
    def comb(input, lst = [], lset = set()):
        if lst:
            yield lst
        for i, el in enumerate(input):
            if lset.isdisjoint(el):
                for out in comb(input[i+1:], lst + [el], lset | set(el)):
                    yield out

    # Uncomment one of the following 2 lines to select method
    #for u in comb (sets):
    for u in accrue ([], set(), sets):
        sizes[len(u)-1] += 1
        if shoUnion: print u
    t1 = time.time()
    for t in range(nsets-1, -1, -1):
        ssum[t] = sizes[t] + ssum[t+1]
    print '{:7.3f}s  Sizes:'.format(t1-t0), [s for (s,t) in zip(sizes, ssum) if t>0]
    nsets += ndelta

Edit: In function accrue , arguments (u, bset, csets) are used as follows:
• u = list of sets in current union of sets
• bset = "big set" = flat value of u = elements already used
• csets = candidate sets = list of sets eligible to be included
Note that if the first line of accrue is replaced by
def accrue(csets, u=[], bset=set()):
and the seventh line by
for v in accrue (ts, y, boc):
(ie, if parameters are re-ordered and defaults given for u and bset) then accrue can be invoked via [accrue(listofsets)] to produce its list of compatible unions.

Regarding the ValueError: zero length field name in format error mentioned in a comment as occurring when using Python 2.6, try the following.

# change:
    print "Set size = {:3d}, Number max = {:3d}".format(setsize, topnum)
# to:
    print "Set size = {0:3d}, Number max = {1:3d}".format(setsize, topnum)

Similar changes (adding appropriate field numbers) may be needed in other formats in the program. Note, the what's new in 2.6 page says “Support for the str.format() method has been backported to Python 2.6”. While it does not say whether field names or numbers are required, it does not show examples without them. By contrast, either way works in 2.7.3.

using itertools.combinations , set.intersection and for-else loop:

from itertools import *
lis=[[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]
def func(lis):
    for i in range(1,len(lis)+1):
       for x in combinations(lis,i):
          s=set(x[0])
          for y in x[1:]:
              if len(s & set(y)) != 0:
                  break
              else:
                  s.update(y)    
          else:
              yield x


for item in func(lis):
    print item

output:

([1, 2, 3],)
([3, 6, 8],)
([4, 9],)
([6, 11],)
([1, 2, 3], [4, 9])
([1, 2, 3], [6, 11])
([3, 6, 8], [4, 9])
([4, 9], [6, 11])
([1, 2, 3], [4, 9], [6, 11])

Similar to NPE's solution , but it's without recursion and it returns a list:

def disjoint_combinations(seqs):
    disjoint = []
    for seq in seqs:
        disjoint.extend([(each + [seq], items.union(seq))
                            for each, items in disjoint
                                if items.isdisjoint(seq)])
        disjoint.append(([seq], set(seq)))
    return [each for each, _ in disjoint]

for each in disjoint_combinations([[1, 2, 3], [3, 6, 8], [4, 9], [6, 11]]):
    print each

Result:

[[1, 2, 3]]
[[3, 6, 8]]
[[1, 2, 3], [4, 9]]
[[3, 6, 8], [4, 9]]
[[4, 9]]
[[1, 2, 3], [6, 11]]
[[1, 2, 3], [4, 9], [6, 11]]
[[4, 9], [6, 11]]
[[6, 11]]

One-liner without employing the itertools package. Here's your data:

lE={}
lE[0]=[1, 2, 3]
lE[1] = [3, 6, 8]
lE[2] = [4, 9]
lE[4] = [6, 11]

Here's the one-liner:

results=[(lE[v1],lE[v2]) for v1 in lE for v2  in lE if (set(lE[v1]).isdisjoint(set(lE[v2])) and v1>v2)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM