简体   繁体   English

Spark join——(边和顶点)

[英]Spark join - (edges and vertices)

I have vertexRDD which has 2 columns我有 2 列的vertexRDD

(vertexId, uniqueVertexId)
(V1, 1L)
(V2, 2L)
(V3, 3L)
(V4, 4L)

And I also have edgeRDD我也有edgeRDD

(srcId, destId)
(V1, V2)
(V2, V3)
(V1, V4)

How can I join them in spark so the edges RDD will be like below我怎样才能加入他们的火花,这样边缘 RDD 就会像下面这样

(srcId, destId, uniqueSrcId, uniqueDestId)
(V1, V2, 1L, 2L)
(V2, V3, 2L, 3L)
(V1, V4, 1L, 4L)

I tried different joins but I couldn't really achieve the expected output. Appreciate any help.我尝试了不同的加入,但我无法真正达到预期的 output。感谢任何帮助。

I will use Java but I guess it is straightforward to convert it to Scala.我将使用 Java,但我想将其转换为 Scala 很简单。
Assuming假设
edgeRDD has type JavaPairRDD<String,String> and edgeRDD具有类型JavaPairRDD<String,String>
vertexRDD has type JavaPairRDD<String,Long> : vertexRDD的类型为JavaPairRDD<String,Long>

  1. edgeRDD.join(vertexRDD) will yield JavaPairRDD<String,Tuple2<String,Long>> with the following content (let's call it join1 ): edgeRDD.join(vertexRDD)将产生具有以下内容的JavaPairRDD<String,Tuple2<String,Long>> (我们称之为join1 ):

     (V1, Tuple2(V2,1L)) (V2, Tuple2(V3,2L)) (V1, Tuple2(V4,1L))
  2. Then you convert join1 into another JavaPairRDD<String,Tuple2<String,Long>> by restructuring the keys and values using map (let's call it join2 ):然后,通过使用 map(我们称之为join2 )重构键和值,将join1转换为另一个JavaPairRDD<String,Tuple2<String,Long>>

     (V2, Tuple2(V1,1L)) (V3, Tuple2(V2,2L)) (V4, Tuple2(V1,1L))
  3. Finally perform vertexRDD.join(join2) to get JavaPairRDD<String,Tuple2<Long,Tuple2<String,Long>>> with contents:最后执行vertexRDD.join(join2)得到JavaPairRDD<String,Tuple2<Long,Tuple2<String,Long>>>内容:

     (V2, Tuple2(2L, Tuple2(V1,1L))) (V3, Tuple2(3L, Tuple2(V2,2L))) (V4, Tuple2(4L, Tuple2(V1,1L)))

which you may pass through the map and create JavaRDD<String> (or a new JavaPairRDD ) by combining keys and values appropriately within the map. I will leave mapping phases up to you.您可以通过 map 并通过在 map 中适当地组合键和值来创建JavaRDD<String> (或新的JavaPairRDD )。我将把映射阶段留给您。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM