简体   繁体   中英

Calculating the number of graphs created and the number of vertices in each graph from a list of edges

Given a list of edges such as, edges = [[1,2],[2,3],[3,1],[4,5]]

I need to find how many graphs are created, by this I mean how many groups of components are created by these edges. Then get the number of vertices in the group of components.

However, I am required to be able to handle 10^5 edges, and i am currently having trouble completing the task for large number of edges.

My algorithm is currently getting the list of edges= [[1,2],[2,3],[3,1],[4,5]] and merging each list as set if they have a intersection, this will output a new list that now contains group components such as , graphs = [[1,2,3],[4,5]]

There are two connected components : [1,2,3] are connected and [4,5] are connected as well.

I would like to know if there is a much better way of doing this task.

def mergeList(edges):
    sets = [set(x) for x in edges if x]
    m = 1
    while m:
        m = 0
        res = []
        while sets:
            common, r = sets[0], sets[1:]
            sets = []
            for x in r:
                if x.isdisjoint(common):
                    sets.append(x)
                else:
                    m = 1
                    common |= x
            res.append(common)
        sets = res
    return sets

I would like to try doing this in a dictionary or something efficient, because this is toooo slow.

A basic iterative graph traversal in Python isn't too bad.

import collections


def connected_components(edges):
    # build the graph
    neighbors = collections.defaultdict(set)
    for u, v in edges:
        neighbors[u].add(v)
        neighbors[v].add(u)
    # traverse the graph
    sizes = []
    visited = set()
    for u in neighbors.keys():
        if u in visited:
            continue
        # visit the component that includes u
        size = 0
        agenda = {u}
        while agenda:
            v = agenda.pop()
            visited.add(v)
            size += 1
            agenda.update(neighbors[v] - visited)
        sizes.append(size)
    return sizes

Do you need to write your own algorithm? networkx already has algorithms for this.

To get the length of each component try

import networkx as nx

G = nx.Graph()
G.add_edges_from([[1,2],[2,3],[3,1],[4,5]])

components = []
for graph in nx.connected_components(G):
  components.append([graph, len(graph)])

components
# [[set([1, 2, 3]), 3], [set([4, 5]), 2]]

You could use Disjoint-set data structure:

edges = [[1,2],[2,3],[3,1],[4,5]]
parents = {}
size = {}

def get_ancestor(parents, item):
    # Returns ancestor for a given item and compresses path
    # Recursion would be easier but might blow stack
    stack = []
    while True:
        parent = parents.setdefault(item, item)
        if parent == item:
            break
        stack.append(item)
        item = parent

    for item in stack:
        parents[item] = parent

    return parent


for x, y in edges:
    x = get_ancestor(parents, x)
    y = get_ancestor(parents, y)
    size_x = size.setdefault(x, 1)
    size_y = size.setdefault(y, 1)
    if size_x < size_y:
        parents[x] = y
        size[y] += size_x
    else:
        parents[y] = x
        size[x] += size_y

print(sum(1 for k, v in parents.items() if k == v)) # 2

In above parents is a dict where vertices are keys and ancestors are values. If given vertex doesn't have a parent then the value is the vertex itself. For every edge in the list the ancestor of both vertices is set the same. Note that when current ancestor is queried the path is compressed so following queries can be done in O(1) time. This allows the whole algorithm to have O(n) time complexity.

Update

In case components are required instead of just number of them the resulting dict can be iterated to produce it:

from collections import defaultdict

components = defaultdict(list)
for k, v in parents.items():
    components[v].append(k)

print(components)

Output:

defaultdict(<type 'list'>, {3: [1, 2, 3], 5: [4, 5]})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM