简体   繁体   English

使用map reduce在bfs上遍历图形的有效方法是什么?

[英]What is an efficient way of traversing a graph with bfs using map reduce?

This was an interview question I got asked by a recruiter, the problem is basically to calculate the shortest path of all node to every node, and my solution was the following 这是招聘人员问我的一个面试问题,问题基本上是计算所有节点到每个节点的最短路径,我的解决方案是

initiate all possible edges (without reverse A - B is the same as BA) 初始化所有可能的边(反向A-B与BA相同)

Each node will be represent in the following (src, cost, current_list, dest) , the src and dest is basically all the possible edges we initiate earlier 每个节点都将在下面表示(src,cost,current_list,dest),src和dest基本上是我们之前启动的所有可能的边

Map: 地图:

for each edge you traverse, you duplicate your tuple and add the current   
traversed node to the cost and current list. 
if the node is the destination you annotate finish, if the the node is 
in the current list, you annotate delete

Reduce: 降低:

Don't really need to do anything besides outputting finish and deleting 
delete and let the other node go through the next round of map
And by outputting I mean for each src, dest pair only output the least cost

The recruiter say this is not efficient, I can see how this is not efficient since you are traversing combinatorialy, but the only alternative I can think of is if you have n node, then spawn n servers and do dijkstra for each node which the recruiter say is also wrong. 招聘人员说这效率不高,我可以看到这效率不高,因为您正在组合遍历,但是我能想到的唯一替代方法是,如果您有n个节点,则生成n个服务器,并对招聘人员的每个节点进行dijkstra说的也是错误的。 Can someone give me some help on this problem? 有人可以帮我解决这个问题吗?

Edit: 编辑:

Ex. 例如 Triangle Graph 三角图

The edges are AB, BC, CA with path cost of 1 边是AB,BC,CA,路径成本为1

Algorithm 算法

  1. First we initiate all possible source destination pair keeping in mind that reverse of edge is not unique AB, AC, BC (BA, CA, BC is omitted) 首先,我们记住所有可能的源目的地对,同时要记住,边沿的反向不是唯一的AB,AC,BC(省略了BA,CA,BC)

for each source destination pair we have the following tuple 对于每个源目标对,我们具有以下元组

(src=A, cost=None, current_list=A, dest=B, annotate=continue)
(src=A, cost=None, current_list=A, dest=C, annotate=continue)
(src=B, cost=None, current_list=B, dest=C, annotate=continue)
  1. Now we start the map reduce algorithm 现在我们开始地图缩小算法

     for each tuple in the tuple list we initiate: for each neighbor of the node at the end of current_list if the next neighbor is already in the current_list set annotate = delete elif the next neighbor is the dest set annotate = finish add path cost to cost else duplicate the current node add neighbor to current_list add path cost to cost delete the current tuple 

In our case 就我们而言

(src=A, cost=None, current_list=A, dest=B, annotate=continue)
 =>
(src=A, cost=1, current_list=AB, dest=B, annotate=finish)
(src=A, cost=1, current_list=AC, dest=B, annotate=continue)

(src=A, cost=None, current_list=A, dest=C, annotate=continue)
=>
(src=A, cost=1, current_list=AC, dest=C, annotate=finish)
(src=A, cost=1, current_list=AB, dest=C, annotate=continue)

(src=B, cost=None, current_list=B, dest=C, annotate=continue)
=>
(src=B, cost=1, current_list=BC, dest=C, annotate=finish)
(src=B, cost=1, current_list=BA, dest=C, annotate=continue)
  1. Reduce 降低

    Note: we reduce on src, dest pair, and use it as our key for every tuple in tuple list 注意:我们减少src,dest对,并将其用作元组列表中每个元组的键

     if annotate == finish keep trace of min cost and delete tuple for each src dest pair that is not the current min then pass the current min as result elif annotate == delete delete the tuple else pass down to the next round of map 
  2. Map 地图

Since we still have some tuple that have annotate = continue 由于我们仍然有一些带有注释的元组=继续

(src=B, cost=1, current_list=BA, dest=C, annotate=continue)  
=>
(src=B, cost=2, current_list=BAC, dest=C, annotate=finish)  
(src=B, cost=2, current_list=BAB, dest=C, annotate=delete)  


(src=A, cost=1, current_list=AC, dest=B, annotate=continue)
=>
(src=A, cost=2, current_list=ACB, dest=B, annotate=finish)
(src=A, cost=2, current_list=ACA, dest=B, annotate=delete)

(src=A, cost=1, current_list=AB, dest=C, annotate=continue)
=>
(src=A, cost=2, current_list=ABC, dest=C, annotate=finish)
(src=A, cost=2, current_list=ABA, dest=C, annotate=delete)
  1. Reduce 降低

We have no continue tuples, now we just use the reduce to find the min for each src dest pair 我们没有连续的元组,现在我们只需要使用reduce来找到每个src dest对的最小值

The inner two loops of Floyd-Warshall are essentially matrix multiplication with addition replaced by min and multiplication replaced by addition. Floyd-Warshall的内部两个循环本质上是矩阵乘法,加法用min代替,乘法用加法代替。 You can do matrix multiplication with a map-reduce, so you can implement Floyd Warshall with |V| 您可以使用map-reduce进行矩阵乘法,因此可以使用| V |实现Floyd Warshall。 map-reduces. 地图减少。

From the wikipedia page on Floyd-Warshall: 在Floyd-Warshall上的Wikipedia页面上:

1 let dist be a |V| × |V| array of minimum distances initialized to ∞ (infinity)
2 for each vertex v
3    dist[v][v] ← 0
4 for each edge (u,v)
5    dist[u][v] ← w(u,v)  // the weight of the edge (u,v)
6 for k from 1 to |V|
7    for i from 1 to |V|
8       for j from 1 to |V|
9          if dist[i][j] > dist[i][k] + dist[k][j] 
10             dist[i][j] ← dist[i][k] + dist[k][j]
11         end if

The inner two loops ( i and j , lines 7 to 11) are structurally the same as matrix multiplication, and you can adapt any "matrix multiplication on map-reduce" solution to perform this. 内部的两个循环( ij ,第7至11行)在结构上与矩阵乘法相同,您可以采用任何“ map-reduce上的矩阵乘法”解决方案来执行此操作。

The outer ( k ) loop becomes |V| 外( k )循环变为| V |。 map-reduces. 地图减少。

I would like to propose the following approach - for finding of shortest paths in graph via map-reduce. 我想提出以下方法-通过map-reduce在图中找到最短路径。

Lets start with tiny example, which will lead to intuition regarding further implementation of algorithm. 让我们从一个很小的例子开始,这将导致对算法进一步实现的直觉。

Imagine, that information about graph is stored in form of adjacency lists (with payload, which represent paths between corresponding nodes). 想象一下,关于图的信息以邻接表的形式存储(带有有效载荷,代表相应节点之间的路径)。 For example: 例如:

A -> [ {B, "AB"}, {C, "AC"}, {D, "AD"} ]

From given example - we can "infer" information about the following connections in graph: 从给定的示例中-我们可以“推断”图中以下连接的信息:

1) Direct connections 1)直接连接

  • A -> B (path: "AB" ) A -> B (路径: "AB"
  • A -> C (path: "AC" ) A -> C (路径: "AC"
  • A -> D (path: "AD" ) A -> D (路径: "AD"

2) Transitive connections through node A 2)通过节点A传递连接

  • B -> C (path: "BAC" ) B -> C (路径: "BAC"

    (where path("B -> C") == reverse(path("A -> B")) + path("A -> C") ) (其中path("B -> C") == reverse(path("A -> B")) + path("A -> C") ))

  • C -> B (path: "CAB" ) C -> B (路径: "CAB"
  • C -> D (path: "CAD" ) C -> D (路径: "CAD"
  • D -> C (path: "DAC" ) D -> C (路径: "DAC"
  • D -> B (path: "DAB" ) D -> B (路径: "DAB"
  • B -> D (path: "BAD" ) B -> D (路径: "BAD"

In other words: we just "mapped" one entry of adjacency list - to multiple pairs of mutually accessible nodes (for all of generated pairs - there exists a path). 换句话说:我们只是将邻接表的一个条目“映射”到多对可互访问的节点对(对于所有生成的对,都存在一个路径)。

Each pair of nodes, actually represents connection: Source -> Target . 每对节点实际上代表连接: Source -> Target

So, now, we can combine all pairs - which has the same source node: 因此,现在,我们可以合并所有对-具有相同的源节点:

Source -> [{Target 1, "Path-to-Target-1"}, {Target 2, "Path-to-target-2"}, ...]

Actually, after the combination - each source will be associated with a list of target nodes: list might contain duplicated target nodes (duplicated target nodes, just corresponds to different possible paths). 实际上,合并之后-每个源都将与目标节点列表相关联: list可能包含重复的目标节点 (重复的目标节点,仅对应于不同的可能路径)。

So, we just need to remove duplicates from list of target nodes (to keep only that target nodes, which corresponds to the shortest paths). 因此,我们只需要从目标节点列表中删除重复项即可(仅保留该目标节点,这与最短路径相对应)。

Two paragraphs from above - actually describes reduce step. 以上两段-实际描述了减少步骤。 The output of reduce step - is the same as input to map step. 减少步骤的输出-与映射步骤的输入相同。

So, finally - just repeat these map-reduce steps until convergence. 所以,最后-只需重复这些map-reduce步骤,直到收敛为止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM