简体   繁体   English

当左侧键为Option []时,Spark左外连接

[英]Spark left outer join when left side key is Option[]

I have 我有

val rdd1 :  RDD[(Option[String], (String, Option[Context]))]

and

val rdd2 : RDD[(String,Double)]

Now, I would like to rdd1.leftOuterJoin(rdd2) but of course I can't because Option[String] is different than String . 现在,我想rdd1.leftOuterJoin(rdd2)但当然我不能,因为Option[String]String不同。

The rationale for the join operation is that in case rdd1's key contains some value, I would like to have an additional info on it. 连接操作的基本原理是,如果rdd1的键包含一些值,我希望有一个额外的信息。 The desired output is of type: RDD[(Options[String],((String, Option[Context]),Option[Double])) 所需的输出类型为: RDD[(Options[String],((String, Option[Context]),Option[Double]))

What's the bypass? 什么是旁路?

You can simply map rdd2 to RDD[(Option[String], Double)] : 您可以简单地将rdd2映射到RDD[(Option[String], Double)]

rdd1.leftOuterJoin(rdd2.map{case (k, v) => (Option(k), v)})

If Context can be expressed using Spark SQL types then you can simply convert both RDDs to DataFrames and join. 如果可以使用Spark SQL类型表示Context那么您可以简单地将两个RDD转换为DataFrame并加入。 None are mapped to NULLs so everything should work as expected. None映射到NULLs所以一切都应该按预期工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM