i have 2 RDDs. the first RDD is the original RDD and the second one is an RDD that i have filtered out from the original one and have done some processes on it. after performing processes i want to join them. the original RDD looks likes below:
(1,5)
(2,60)
(3,7)
(4,1)
(5,1)
...
(10,8)
and the filtered and manipulated RDD is :
(4,3)
(5,10)
(6,6)
(7,9)
how should i join them?? when i use fullouterjoin or other join methods it gives error
Edited
i wrote code as you said like this:
original_RDD=original_RDD.fullOuterJoin(new_RDD).foreach { case (joinKey, (oldOption, newOption)) =>
newOption match {
case None => (joinKey,oldOption)
case Some(newOption) => (joinKey,newOption)
}
}
but i get this error:
Error:(232, 55) type mismatch;
found : Unit
required: org.apache.spark.rdd.RDD[(Long, Int)]
nodes=nodes.fullOuterJoin(joined_new).foreach { case (joinKey, (oldOption, newOption)) =>
See join syntax
When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key. Outer joins are supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.
originalRdd
.fullOuterJoin(joinRdd)
.foreach { case (joinKey, (oldOption, newOption)) =>
newOption match {
case None => println("new value is None")
case Some(joinValue) => println(s"new value = $joinValue")
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.