Get disconnected pairs of nodes in the network graph?

Question

This is my dataset:

4095    546
3213    2059 
4897    2661 
...
3586    2583
3437    3317
3364    1216

Each line is a pair of nodes which have an edge between them. The whole dataset build an graph. But I want to get many node pairs which are disconnected with each other. How can I get 1000(or more) such node pairs from dataset? Such as:

2761    2788
4777    3365
3631    3553
...
3717    4074
3013    2225

Each line is a pair of nodes without edge.

Answer 1

Just do a BFS or DFS to get the size of every connected component in O(|E|) time. Then once you have the component sizes, you can get the number of disconnected nodes easily: it's the sum of the products of every pair of sizes.

Eg. If your graph has 3 connected components with sizes: 50, 20, 100. Then the number of pairs of disconnected nodes is: 50*20 + 50*100 + 20*100 = 8000 .

If you want to actually output the disconnected pairs instead of just counting them, you should probably use union-find and then just iterate through all pairs of nodes and output them if they're not in the same component.

Answer 2

Please see the part under the EDIT!

I think the other options are more general, and probably nicer from a programmatic view. I just had a quick idea how you could get the list in a very easy way using numpy.

First create the adjacency matrix and your list of nodes is an array:

    import numpy as np
    node_list= np.random.randint(10 , size=(10, 2))
    A = np.zeros((np.max(node_list) + 1, np.max(node_list) + 1)) # + 1 to account for zero indexing
    A[node_list[:, 0],  node_list[:, 1]] = 1 # set connected nodes to 1
    x, y = np.where(A == 0) # Find disconnected nodes
    disconnected_list = np.vstack([x, y]).T # The final list of disconnected nodes

I have no idea though, how this will work with really large scale networks.

EDIT: The above solution was me thinking a bit too fast. As of now the solution above provides the missing edges between nodes, not the disconnected nodes (in the case of a directed graph). Furthermore, the disconnected_list includes the each node twice. Here is a hacky second idea of solution:

    import numpy as np
    node_list= np.random.randint(10 , size=(10, 2))
    A = np.zeros((np.max(node_list) + 1, np.max(node_list) + 1)) # + 1 to account for zero indexing
    A[node_list[:, 0], node_list[:, 1]] = 1 # set connected nodes to 1 
    A[node_list[:, 1], node_list[:, 0]] = 1 # Make the graph symmetric
    A = A + np.triu(np.ones(A.shape)) # Add ones to the upper triangular
    # matrix, so they are not considered in np.where (set k if you want to consider the diagonal)
    x, y = np.where(A == 0) # Find disconnected nodes
    disconnected_list = np.vstack([x, y]).T # The final list of disconnected nodes

Get disconnected pairs of nodes in the network graph?

Question

2 answers

solution1
0 2018-11-08 05:45:00

solution2
0 ACCPTED 2018-11-08 06:19:47

Get disconnected pairs of nodes in the network graph?

Question

2 answers

solution1 0 2018-11-08 05:45:00

solution2 0 ACCPTED 2018-11-08 06:19:47

solution1
0 2018-11-08 05:45:00

solution2
0 ACCPTED 2018-11-08 06:19:47