简体   繁体   中英

Join and map 2 RDDs conditionally

I have 2 RDDs which need to be joined

val rdd1 = RDD[(v_id, inputObject1)]

where v_id is unique id

and inputObject1 has following fields

g_id, p_id, timestamp=t1

Now i have another RDD

val rdd2 = RDD[(g_id, inputObject2)]

where inputObject2 has following fields

p_id, timestamp=t2, e_id

Now i want to join these 2 RDDs on below condition

  • If g_id and p_id is same and |t1-t2| < 30 minutes
  • Else if g_id is same and |t1 - t2| < 30 minutes

So second condition is fallback if first condition is not met. My final output should be this

val resuldRDD = RDD[(v_id, inputObject11)]

Where inputObject11 = inputObject1 + adding e_id from second RDD if conditions are met.

So fields will be

g_id, p_id, e_id, timestamp=t1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM