简体   繁体   中英

RDD lookup inside a transformation

I have 2 paired RDDs like below

RDD1 contains name as key and zipcode as value:

RDD1 -> RDD( (ashley, 20171), (yash, 33613), (evan, 40217) )

RDD2 contains zip code as key and some random number as value:

RDD2 -> RDD( (20171, 235523), (33613, 345345345), (40189, 44355217), (40122, 2345235), (40127, 232323424) )

I need to replace the zipcodes in RDD1 with the corresponding values from RDD2. So the output would be

RDD3 -> RDD( (ashley, 235523), (yash, 345345345), (evan, 232323424) )

I tried doing it using the RDD lookup method like below but I got exception saying that RDD transformations cannot be perfomed inside another RDD transformation

val rdd3 = rdd1.map( x => (x._1, rdd2.lookup(x._2)(0)) )

Yon can simply join 2 RDDs by zipcode:

rdd1.map({case (name, zipcode) => (zipcode, name)})
    .join(rdd2)
    .map({case (zipcode, (name, number)) => (name, number)})
    .collect()

Note, this will return only records, that have matching zipcodes in rdd1 and rdd2. If you want to set some default number to records in rdd1, that doesn't have corresponding zipcode in rdd2, use leftOuterJoin insted of join :

rdd1.map({case (name, zipcode) => (zipcode, name)})
    .leftOuterJoin(rdd2)
    .map({case (zipcode, (name, number)) => (name, number.getOrElse(0))})
    .collect()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM