This is the already asked question but I could not understand the answers properly.
I have two RDDs with same number of columns and same number of records
RDD1(col1,col2,col3)
and
RDD2(colA,colB,colC)
I need to join them as following :
RDD_FINAL(col1,col2,col3,colA,colB,colC)
There is no key
to perform a join between records but they are in order which means the first record of RDD1 is corresponded to first record of RDD2.
您可以使用zipWithIndex
方法将行的索引添加为两个 RDD 的键,并通过键连接它。
Adding code snippet for Alfilercio's example.
JavaRDD<col1,col2,col3> rdd1 = ...
JavaPairRDD<Long, Tuple3<col1,col2,col3>> pairRdd1 = rdd1.zipWithUniqueId().mapToPair(pair -> new Tuple2<>(pair._2(),pair._1());
JavaRDD<colA,colB,colC> rdd2 = ...
JavaPairRDD<Long, Tuple3<colA,colB,colC>> pairRdd2 = rdd2.zipWithUniqueId().mapToPair(pair -> new Tuple2<>(pair._2(),pair._1());
JavaRDD<Tuple2<Tuple3<col1, col2, col3>, Tuple3<colA,colB,colC>>> mappedRdd = pairRdd1.join(pairRdd2).map(pair -> pair._2());
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.