简体   繁体   中英

Finding Overlapping Lists in a List of Lists

I have a list of lists that need to be merged based on common occurrences of list items. Lists that share elements need to be merged together to form clusters.

I considered a breadth-first traversal to do this, but because of the way the list of lists is arranged, it is difficult to implement the traversal

Example list of lists:

input: 
[
 [1,2,3],
 [2,4,5],
 [4,6,8],
 [9,10,16],
 [16,18,19],
 [20,21,22]
]
output: [[1,2,3,4,5,6,8], [9,10,16,18,19], [20,21,22]]

The first three lists need to be merged into a single list (first list and the second list have 2, second and third lists share 4), the fourth and fifth need to be merged because the two share 16. The third is not merged with any other list as it doesn't share any element with the others.

While this can be done in O(n^2) time (n being the number of lists), I am trying to find the most efficient way possible.

You can do this in O(N * log N) where N is total amount of items in all lists.

The idea is simple using Union Find data structure:

  1. First let's create N disjoint sets for each unique item in input
  2. Merge disjoint sets for all neighboring items for each lists
  3. Collect the result from disjoint sets

Sample code:

def Find(id,P):
    if P[id]<0 : return id
    P[id]=Find(P[id],P)
    return P[id]

def Union(id1, id2, p):
    id1 = Find(id1,P)
    id2 = Find(id2,P)
    if id1 != id2:
        P[id2]=id1

input=[
 [1,2,3],
 [2,4,5],
 [4,6,8],
 [9,10,16],
 [16,18,19],
 [20,21,22]
]

P = {}

for list in input :
    for item in list :
        P[item] = -1

for list in input :
    for i in range(1,len(list)):
            Union(list[i-1], list[i], P)

ans = {}
for list in input :
    for item in list :
        if Find(item,P) not in ans:
            ans[Find(item,P)] = []
        ans[Find(item,P)].append(item)

ans = [set(x) for x in ans.values()]
print(ans)

Your inner lists don't have repeated elements. If that is the general case then the set comsolidation task on Rosetta Code has a Python solution that would work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM