I am trying to join two paired RDDs, as per the answer provided here
Joining two RDD[String] -Spark Scala
I am getting an error
error: value leftOuterJoin is not a member of org.apache.spark.rdd.RDD[
The code snippet is as below.
val pairRDDTransactions = parsedTransaction.map
{
case ( field3, field4, field5, field6, field7,
field1, field2, udfChar1, udfChar2, udfChar3) =>
((field1, field2), field3, field4, field5,
field6, field7, udfChar1, udfChar2, udfChar3)
}
val pairRDDAccounts = parsedAccounts.map
{
case (field8, field1, field2, field9, field10 ) =>
((field1, field2), field8, field9, field10)
}
val transactionAddrJoin = pairRDDTransactions.leftOuterJoin(pairRDDAccounts).map {
case ((field1, field2), (field3, field4, field5, field6,
field7, udfChar1, udfChar2, udfChar3, field8, field9, field10)) =>
(field1, field2, field3, field4, field5, field6,
field7, udfChar1, udfChar2, udfChar3, field8, field9, field10)
}
In this case, field1 and field 2 are my keys, on which I want to perform join.
Joins are defined for RDD[(K, V)]
( RDD
of Tuple2
objects. In you case however, there arbitrary tuples ( Tuple4[_, _, _, _]
and Tuple8[_, _, _, _, _, _, _, _]
) - this just cannot work.
You should
... =>
((field1, field2),
(field3, field4, field5, field6, field7, udfChar1, udfChar2, udfChar3)
and
... =>
((field1, field2), (field8, field9, field10))
respectively.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.