简体   繁体   中英

Algorithm/Approach to find destination node from source node of exactly k edges in undirected graph of billion nodes without cycle

Consider I have an adjacency list of billion of nodes structured using hash table arranged in the following manner:

key = source node
value = hash_table { node1, node2, node3}

The input values are from text file in the form of
from , to
1,2
1,5
1,11
... so on

eg. key = '1'
value = {'2','5','11'}
means 1 is connected to nodes 2,5,11

I want to know an algorithm or approach to find destination node from source node of exactly k edges in an undirected graph of billion nodes without cycle

for eg. from node 1 I want to find node 50 only till depth 3 or till 3 edges.

My assumption the algorithm finds 1 - 2 - 60 - 50 which is the shortest path but how would the traversing be efficient using the above adjacency list structure? I do not want to use Hadoop/Map Reduce.

I came up with naive solution as below in Python but is not efficient. Only thing is hash table searches key in O(1) so I could just search neighbours and their billion neighbours directly for the key. The following algo takes lot of time.

  1. start with source node
  2. use hash table search for finding key
  3. go 1 level deeper with hash table of neighbor nodes and find their values for destination nodes until node found
  4. Stop if node not found on k depth
&nbsp1
|
{2 5 11}
| | |
{3,6,7} {nodes} {nodes} .... connected nodes
| | | | |
{nodes} {nodes} {nodes} .... million more connected nodes.


Please suggest. The algorithm above implemented similar to BFS takes more than 3 hours to search for all the possible key value relationships. Can be it be reduced with other searching method?

As you've hinted, this will depend a lot on the data access characteristics of your system. If you were restricted to single-element accesses, then you'd be truly stuck, as trincot observes. However, if you can manage block accesses, then you have a chance of parallel operations.

However, I think that would be outside your control: the hash function owns the adjacency characteristics -- and, in fact, will likely "pessimize" (opposite of "optimize") that characteristic.

I do see one possible hope: use iteration instead of recursion, maintaining a list of nodes to visit. When you place a new node on the list, get its hash value. If you can organize the nodes clustered by location, you can perhaps do a block transfer, accessing several values in one read operation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM