简体   繁体   中英

Joining two (paired) RDDs in Scala, Spark

I am trying to join two paired RDDs, as per the answer provided here

Joining two RDD[String] -Spark Scala

I am getting an error

error: value leftOuterJoin is not a member of org.apache.spark.rdd.RDD[

The code snippet is as below.

val pairRDDTransactions = parsedTransaction.map 
     {
              case ( field3, field4, field5, field6, field7,
           field1, field2, udfChar1, udfChar2, udfChar3) => 
             ((field1, field2), field3, field4, field5, 
                 field6, field7, udfChar1, udfChar2, udfChar3)   
     }      



val pairRDDAccounts  = parsedAccounts.map
     {
       case (field8, field1, field2, field9, field10 ) =>
         ((field1, field2), field8, field9, field10)

     }  



val transactionAddrJoin = pairRDDTransactions.leftOuterJoin(pairRDDAccounts).map {       
       case ((field1, field2), (field3, field4, field5, field6,
           field7, udfChar1, udfChar2, udfChar3, field8, field9, field10)) =>
             (field1, field2, field3, field4, field5, field6,
           field7, udfChar1, udfChar2, udfChar3, field8, field9, field10)           

 }

In this case, field1 and field 2 are my keys, on which I want to perform join.

Joins are defined for RDD[(K, V)] ( RDD of Tuple2 objects. In you case however, there arbitrary tuples ( Tuple4[_, _, _, _] and Tuple8[_, _, _, _, _, _, _, _] ) - this just cannot work.

You should

... => 
  ((field1, field2), 
     (field3, field4, field5, field6, field7, udfChar1, udfChar2, udfChar3)   

and

... =>
  ((field1, field2), (field8, field9, field10))

respectively.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM