I have 2 RDDs which need to be joined
val rdd1 = RDD[(v_id, inputObject1)]
where v_id is unique id
and inputObject1 has following fields
g_id, p_id, timestamp=t1
Now i have another RDD
val rdd2 = RDD[(g_id, inputObject2)]
where inputObject2 has following fields
p_id, timestamp=t2, e_id
Now i want to join these 2 RDDs on below condition
So second condition is fallback if first condition is not met. My final output should be this
val resuldRDD = RDD[(v_id, inputObject11)]
Where inputObject11 = inputObject1 + adding e_id from second RDD if conditions are met.
So fields will be
g_id, p_id, e_id, timestamp=t1
This is not possible: the join
operations are by key
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.