简体   繁体   中英

Subtract Two RDD contains List As Value in Spark/Scala

I am new to scala. I have two RDD of below type :

RDD[(Long, List[Long])]

I want to subtract value inside List[Long] from two RDD.

For Example:

rddPair1 contains :

((4,List(5)), (1,List(2)), (2,List(4, 3, 4)), (3,List(6, 4)))

rddPair2 contains :

((5,List(6)), (2,List(3)), (3,List(4)))

I want resultant RDD Something like below :

(4,List(5)), (1,List(2)), (2,List(4, 4)), (3,List(6))

You can check here 2 , 3 keys matches and for this keys List value of rddPair2 gets subtracted from value of rddPair1.

Thanks In Advance

You can use leftOuterJoin and then map the results to get the desired format:

val result: RDD[(Int, List[Int])] = rddPair1.leftOuterJoin(rddPair2).mapValues {
  case (l1, Some(l2)) => l1.diff(l2) // match found - remove l2 from l1
  case (l1, None) => l1              // no match  - keep l1 as is
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM