I have two sets of RDD's that look like this:
rdd1 = [(12, abcd, lmno), (45, wxyz, rstw), (67, asdf, wert)]
rdd2 = [(12, abcd, lmno), (87, whsh, jnmk), (45, wxyz, rstw)]
I need to create a new RDD that has all the values found in rdd2
that don't have corresponding matches in rdd1
. So the created RDD should contain the following data:
rdd3 = [(87, whsh, jnmk)]
Does anyone know how to accomplish this?
You can do a full join and then create 2 new RDDs.
You'll need to first convert them to KV rdds. Sample code below: rdd3 = rdd1.fullJoin(rdd2).filter(x => x._3.exists).map(x => (x._1, x._3.get))
(Yes, there is a more idiomatic way to get the option but this should work)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.