I have two different RDDs, each RDD have some common fields, based on that fields i want to get unmatched records from RDD1 or RDD2.[Records available in RDD1 but not available in RDD2] [Records available in RDD2 but not available in RDD1]
It seems we could use subtract
or subtractbyKey
.
Sample Input:
**File 1:**
sam,23,cricket
alex,34,football
ann,21,football
**File 2:**
ruby,25,football,usa
alex,34,cricket,usa
ann,21,cricket,usa
**expected output:**
sam,23,cricket
Update:
Currently i am using Spark SQL to get the unmatched records from the RDDs(Writing a query to get the unmatched records).
What i am looking is, is it something we can do it with Spark Core itself instead of using Spark SQL and also i am not looking the code, is there any operation available in Spark Core?
Please advise on this.
Regards, Shankar.
You could bring both RDDs to the same shape and use subtract
to remove the common elements.
Given rdd1
from file1
and rdd2
from file2
as presented above, you could do something like:
val userScore2 = rdd2.map{case (name, score, sport, country) => (name, score, sport)}
val in1andNotin2 = rdd1 subtract userScore2
val in2andNotIn1 = userScore2 subtract rdd1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.