This is my dataset:
4095 546
3213 2059
4897 2661
...
3586 2583
3437 3317
3364 1216
Each line is a pair of nodes which have an edge between them. The whole dataset build an graph. But I want to get many node pairs which are disconnected with each other. How can I get 1000(or more) such node pairs from dataset? Such as:
2761 2788
4777 3365
3631 3553
...
3717 4074
3013 2225
Each line is a pair of nodes without edge.
Just do a BFS or DFS to get the size of every connected component in O(|E|)
time. Then once you have the component sizes, you can get the number of disconnected nodes easily: it's the sum of the products of every pair of sizes.
Eg. If your graph has 3 connected components with sizes: 50, 20, 100. Then the number of pairs of disconnected nodes is: 50*20 + 50*100 + 20*100 = 8000
.
If you want to actually output the disconnected pairs instead of just counting them, you should probably use union-find and then just iterate through all pairs of nodes and output them if they're not in the same component.
Please see the part under the EDIT!
I think the other options are more general, and probably nicer from a programmatic view. I just had a quick idea how you could get the list in a very easy way using numpy.
First create the adjacency matrix and your list of nodes is an array:
import numpy as np
node_list= np.random.randint(10 , size=(10, 2))
A = np.zeros((np.max(node_list) + 1, np.max(node_list) + 1)) # + 1 to account for zero indexing
A[node_list[:, 0], node_list[:, 1]] = 1 # set connected nodes to 1
x, y = np.where(A == 0) # Find disconnected nodes
disconnected_list = np.vstack([x, y]).T # The final list of disconnected nodes
I have no idea though, how this will work with really large scale networks.
EDIT: The above solution was me thinking a bit too fast. As of now the solution above provides the missing edges between nodes, not the disconnected nodes (in the case of a directed graph). Furthermore, the disconnected_list includes the each node twice. Here is a hacky second idea of solution:
import numpy as np
node_list= np.random.randint(10 , size=(10, 2))
A = np.zeros((np.max(node_list) + 1, np.max(node_list) + 1)) # + 1 to account for zero indexing
A[node_list[:, 0], node_list[:, 1]] = 1 # set connected nodes to 1
A[node_list[:, 1], node_list[:, 0]] = 1 # Make the graph symmetric
A = A + np.triu(np.ones(A.shape)) # Add ones to the upper triangular
# matrix, so they are not considered in np.where (set k if you want to consider the diagonal)
x, y = np.where(A == 0) # Find disconnected nodes
disconnected_list = np.vstack([x, y]).T # The final list of disconnected nodes
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.