简体   繁体   中英

Merge two RDDs in Spark Scala

I have two RDDs.

rdd1 = (String, String)

key1, value11
key2, value12
key3, value13

rdd2 = (String, String)

key2, value22
key3, value23
key4, value24

I need to form another RDD with merged rows from rdd1 and rdd2, the output should look like:

key2, value12 ; value22
key3, value13 ; value23

So, basically it's nothing but taking the intersection of the keys of rdd1 and rdd2 and then join their values. ** The values should be in order ie value(rdd1) + value(rdd2) and not reverse.

I think this may be what you are looking for:

join(otherDataset, [numTasks])  

When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.

See the associated section of the docs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM