[英]Joining two (paired) RDDs in Scala, Spark
I am trying to join two paired RDDs, as per the answer provided here 根据此处提供的答案,我正在尝试加入两个配对的RDD
Joining two RDD[String] -Spark Scala 连接两个RDD [String] -Spark Scala
I am getting an error 我收到一个错误
error: value leftOuterJoin is not a member of org.apache.spark.rdd.RDD[ 错误:值leftOuterJoin不是org.apache.spark.rdd.RDD的成员
The code snippet is as below. 代码段如下。
val pairRDDTransactions = parsedTransaction.map
{
case ( field3, field4, field5, field6, field7,
field1, field2, udfChar1, udfChar2, udfChar3) =>
((field1, field2), field3, field4, field5,
field6, field7, udfChar1, udfChar2, udfChar3)
}
val pairRDDAccounts = parsedAccounts.map
{
case (field8, field1, field2, field9, field10 ) =>
((field1, field2), field8, field9, field10)
}
val transactionAddrJoin = pairRDDTransactions.leftOuterJoin(pairRDDAccounts).map {
case ((field1, field2), (field3, field4, field5, field6,
field7, udfChar1, udfChar2, udfChar3, field8, field9, field10)) =>
(field1, field2, field3, field4, field5, field6,
field7, udfChar1, udfChar2, udfChar3, field8, field9, field10)
}
In this case, field1 and field 2 are my keys, on which I want to perform join. 在这种情况下,field1和field 2是我要在其上执行联接的键。
Joins are defined for RDD[(K, V)]
( RDD
of Tuple2
objects. In you case however, there arbitrary tuples ( Tuple4[_, _, _, _]
and Tuple8[_, _, _, _, _, _, _, _]
) - this just cannot work. 为
RDD[(K, V)]
( Tuple2
对象的RDD
定义了联接。但是,在您的情况下,存在任意元组( Tuple4[_, _, _, _]
和Tuple8[_, _, _, _, _, _, _, _]
)-这是行不通的。
You should 你应该
... =>
((field1, field2),
(field3, field4, field5, field6, field7, udfChar1, udfChar2, udfChar3)
and 和
... =>
((field1, field2), (field8, field9, field10))
respectively. 分别。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.