I have two RDDs.
rdd1 = (String, String)
key1, value11
key2, value12
key3, value13
rdd2 = (String, String)
key2, value22
key3, value23
key4, value24
I need to form another RDD with merged rows from rdd1 and rdd2, the output should look like:
key2, value12 ; value22
key3, value13 ; value23
So, basically it's nothing but taking the intersection of the keys of rdd1 and rdd2 and then join their values. ** The values should be in order ie value(rdd1) + value(rdd2) and not reverse.
I think this may be what you are looking for:
join(otherDataset, [numTasks])
When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.
Check join()
in PairRDDFunctions:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.