[英]Spark left outer join when left side key is Option[]
I have 我有
val rdd1 : RDD[(Option[String], (String, Option[Context]))]
and 和
val rdd2 : RDD[(String,Double)]
Now, I would like to rdd1.leftOuterJoin(rdd2)
but of course I can't because Option[String]
is different than String
. 现在,我想
rdd1.leftOuterJoin(rdd2)
但当然我不能,因为Option[String]
与String
不同。
The rationale for the join operation is that in case rdd1's key contains some value, I would like to have an additional info on it. 连接操作的基本原理是,如果rdd1的键包含一些值,我希望有一个额外的信息。 The desired output is of type:
RDD[(Options[String],((String, Option[Context]),Option[Double]))
所需的输出类型为:
RDD[(Options[String],((String, Option[Context]),Option[Double]))
What's the bypass? 什么是旁路?
You can simply map rdd2
to RDD[(Option[String], Double)]
: 您可以简单地将
rdd2
映射到RDD[(Option[String], Double)]
:
rdd1.leftOuterJoin(rdd2.map{case (k, v) => (Option(k), v)})
If Context
can be expressed using Spark SQL types then you can simply convert both RDDs to DataFrames and join. 如果可以使用Spark SQL类型表示
Context
那么您可以简单地将两个RDD转换为DataFrame并加入。 None
are mapped to NULLs
so everything should work as expected. None
映射到NULLs
所以一切都应该按预期工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.