简体   繁体   English

当发生任何相交时,如何获得2D列表项的并集(有效方式)?

[英]How can i get union of 2D list items when there occurs any intersection (in efficient way)?

I have 2D list in python 我在python中有2D列表

list = [[9, 2, 7], [9, 7], [2, 7], [1, 0], [0, 5, 4]]

I would like to get union of list items if there occurs any intersection. 如果发生任何交集,我希望得到列表项的并集。 For example [9, 2, 7] , [9, 7] , [2, 7] has intersection of more than one digit. 例如, [9, 2, 7][9, 7][2, 7]交点超过一个数字。 The union of this would be [9,2,7] . 该联合将是[9,2,7]

How can i get the final list as follows in efficient way ? 我如何以有效的方式获得以下最终名单?

finalList = [[9,2,7], [0, 1, 5, 4]]

NB order of numbers is not important. 注意数字的顺序并不重要。

Here is a theoretical answer: This is a connected component problem: you build a graph as follows: 这是一个理论上的答案:这是一个连接的组件问题:您可以按以下方式构建图形:

  • there is a vertex for each set is the list 每个集合都有一个顶点是列表
  • there is an edge between two sets when they have a common value. 当两个集合具有公共值时,它们之间存在一条边。

what you want is the union of the connected components of the graph. 您想要的是图形的已连接组件的并集。

You have a graph problem. 您有图形问题。 You want to build connected components in a graph whose vertices are elements of your sublists, and where two vertices have an edge between them if they're elements of the same sublist. 您想在图中构建连接的组件,其顶点是子列表的元素,并且如果两个顶点是同一子列表的元素,则两个顶点之间有一条边。 You could build an adjacency-list representation of your input and run a graph search algorithm over it, or you could iterate over your input and build disjoint sets. 您可以构建输入的邻接列表表示并在其上运行图搜索算法,或者可以迭代输入并构建不相交集。 Here's a slightly-modified connected components algorithm I wrote up for a similar question : 这是我为类似问题写的经过稍微修改的连接组件算法:

import collections

# build an adjacency list representation of your input
graph = collections.defaultdict(set)
for l in input_list:
    if l:
        first = l[0]
        for element in l:
            graph[first].add(element)
            graph[element].add(first)

# breadth-first search the graph to produce the output
output = []
marked = set() # a set of all nodes whose connected component is known
for node in graph:
    if node not in marked:
        # this node is not in any previously seen connected component
        # run a breadth-first search to determine its connected component
        frontier = set([node])
        connected_component = []
        while frontier:
            marked |= frontier
            connected_component.extend(frontier)

            # find all unmarked nodes directly connected to frontier nodes
            # they will form the new frontier
            new_frontier = set()
            for node in frontier:
                new_frontier |= graph[node] - marked
            frontier = new_frontier
        output.append(tuple(connected_component))

Here is an answer without any imports: 这是一个没有任何输入的答案:

def func(L):
    r = []
    cur = set()
    for l in L:
        if not cur:
            cur = set(l)
        if any(i in cur for i in l):
            cur.update(l)
        else:
            r.append(cur)
            cur = set(l)
    r.append(cur)
    while len(r)>1:
        if any(i in r[0] for i in r[-1]):
            r[-1].update(r.pop(0))
        else:
            break
    return r

Using it: 使用它:

>>> func([[9, 2, 7], [9, 7], [2, 7], [1, 0], [0, 5, 4]])
[set([9, 2, 7]), set([0, 1, 4, 5])]
>>> func([[0],[1],[2],[0,1]])
[set([2]), set([0, 1])]

You can remove the set and return a list of lists by changing r.append(cur) into r.append(list(cur)) , but I think it is neater to return sets. 您可以通过将r.append(cur)更改为r.append(list(cur))来删除set并返回列表列表,但是我认为返回集合r.append(list(cur))整洁。

This one uses sets: 这套使用:

>>> l = [[9, 2, 7], [9, 7], [2, 7], [1, 0], [0, 5, 4]]
>>> done = []
>>> while len(done) != len(l):
    start = min([i for i in range(len(l)) if i not in done])
    ref = set(l[start])
    for j in [i for i in range(len(l)) if i not in done]:
        if set(l[j]) & ref:
            done.append(j)
            ref |= set(l[j])
    print ref


set([2, 7, 9])
set([0, 1, 4, 5])

I propose that you examine each pair of list with itertools 我建议您使用itertools检查每对列表

import itertools, numpy

ls_tmp_rmv = []

while True:
    ls_tmp = []

    for s, k in itertools.combinations(lisst, 2):
        if len(set(s).intersection( set(k) )) > 0:

            ls_tmp = ls_tmp + [numpy.unique(s + k).tolist()]

            if [s] not in ls_tmp:
                ls_tmp_rmv = ls_tmp_rmv + [s]
            if [k] not in ls_tmp:
                ls_tmp_rmv = ls_tmp_rmv + [k]
        else:
            ls_tmp = ls_tmp + [s] + [k]

    ls_tmp = [ls_tmp[i] for i in range(len(ls_tmp)) if ls_tmp[i] 
                    not in ls_tmp[i+1:]]
    ls_tmp_rmv = [ls_tmp_rmv[i] for i in range(len(ls_tmp_rmv)) 
                     if ls_tmp_rmv[i] not in ls_tmp_rmv[i+1:]]

    ls_tmp = [X for X in ls_tmp if X not in ls_tmp_rmv]

    if ls_tmp == lisst :
        break
    else:
        lisst = ls_tmp

print lisst

You take all combinations of all pairs of lists in your list and check whether there are elements in common. 您将列表中所有列表对的所有组合进行检查,并检查是否存在相同的元素。 If so, you merge the pair. 如果是这样,则合并该对。 If not, you add both peers in the pair. 如果没有,则将两个对等体都添加到对中。 You keep in mind the elements you merged to remove them from the resulting list in the end. 您要牢记合并的元素,最后将它们从结果列表中删除。

With the list 与清单

lisst = [[1,2], [2,3], [8,9], [3,4]]

you do get 你得到

[[1, 2, 3, 4], [8, 9]]
def intersection_groups(lst):
    lst = map(set, lst)
    a, b = 0, 1
    while a < len(lst) - 1:
        while b < len(lst):
            if not lst[a].isdisjoint(lst[b]):
                lst[a].update(lst.pop(b))
            else:
                b += 1
        a, b = a + 1, a + 2
    return lst

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM