简体   繁体   中英

Group Python lists based on repeated items

This question is very similar to this one Group Python list of lists into groups based on overlapping items , in fact it could be called a duplicate.

Basically, I have a list of sub-lists where each sub-list contains some number of integers (this number is not the same among sub-lists). I need to group all sub-lists that share one integer or more.

The reason I'm asking a new separate question is because I'm attempting to adapt Martijn Pieters' great answer with no luck.

Here's the MWE:

def grouper(sequence):
    result = []  # will hold (members, group) tuples

    for item in sequence:
        for members, group in result:
            if members.intersection(item):  # overlap
                members.update(item)
                group.append(item)
                break
        else:  # no group found, add new
            result.append((set(item), [item]))

    return [group for members, group in result]


gr = [[29, 27, 26, 28], [31, 11, 10, 3, 30], [71, 51, 52, 69],
      [78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81],
      [86, 68, 67, 84]]

for i, group in enumerate(grouper(gr)):
    print 'g{}:'.format(i), group

and the output I get is:

g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [84, 67, 78, 77, 81], [86, 68, 67, 84]]
g4: [[86, 84, 81, 82, 83, 85]]

The last group g4 should have been merged with g3 , since the lists inside them share the items 81 , 83 and 84 , and even a single repeated element should be enough for them to be merged.

I'm not sure if I'm applying the code wrong, or if there's something wrong with the code.

Sounds like set consolidation if you turn each sub list into a set instead as you are interested in the contents not the order so sets are the best data-structure choice. See this: http://rosettacode.org/wiki/Set_consolidation

You can describe the merge you want to do as a set consolidation or as a connected-components problem. I tend to use an off-the-shelf set consolidation algorithm and then adapt it to the particular situation. For example, IIUC, you could use something like

def consolidate(sets):
    # http://rosettacode.org/wiki/Set_consolidation#Python:_Iterative
    setlist = [s for s in sets if s]
    for i, s1 in enumerate(setlist):
        if s1:
            for s2 in setlist[i+1:]:
                intersection = s1.intersection(s2)
                if intersection:
                    s2.update(s1)
                    s1.clear()
                    s1 = s2
    return [s for s in setlist if s]

def wrapper(seqs):
    consolidated = consolidate(map(set, seqs))
    groupmap = {x: i for i,seq in enumerate(consolidated) for x in seq}
    output = {}
    for seq in seqs:
        target = output.setdefault(groupmap[seq[0]], [])
        target.append(seq)
    return list(output.values())

which gives

>>> for i, group in enumerate(wrapper(gr)):
...     print('g{}:'.format(i), group)
...     
g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81], [86, 68, 67, 84]]

(Order not guaranteed because of the use of the dictionaries.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM