简体   繁体   English

Null值Spark中的左外部联接

[英]Null value Left Outer Join in Spark

I have two RDD and got the left join 我有两个RDD并加入了左联接

left join p1.leftOuterJoin(p2) the result is like: 左连接p1.leftOuterJoin(p2)结果类似于:

Array[((String, String), (Int, Option[Int]))] = 
Array(((1001-150329-002-0-04624,5060567),(1,None)), ((1002-141105-008-0-01934,10145500),(1,None)), ((1013-150324-009-0-02270,15750046),(1,None)), ((1005-150814-005-0-05885,5060656),(1,Some(1))), ((1009-150318-004-0-02537,5060583),(1,None)))

I want to replace all None with 0 and get a clean data set like: 我想将所有None都替换为0并得到一个干净的数据集,例如:

Array(((1001-150329-002-0-04624,5060567),0), ((1002-141105-008-0-01934,10145500),0), ((1013-150324-009-0-02270,15750046),0), ((1005-150814-005-0-05885,5060656),1)), ((1009-150318-004-0-02537,5060583),0))

Basically replace all (1,None) with 0 and (1,Some(1)) with 1 基本上将所有(1,None)替换为0 ,将(1,Some(1))替换为1

If you are looking for the value of the Option when Some or 0 when None , I would implement it with .map and .getOrElse : 如果您在“ Some时寻找Option的值,而在“ None时寻找0的值,则可以使用.map.getOrElse来实现它:

a.map { case (k, (_, o)) => (k, o.getOrElse(0)) }

The result match the expected one: 结果与预期的相符:

Array(
  ((1001-150329-002-0-04624,5060567),0),
  ((1002-141105-008-0-01934,10145500),0),
  ((1013-150324-009-0-02270,15750046),0),
  ((1005-150814-005-0-05885,5060656),1)),
  ((1009-150318-004-0-02537,5060583),0))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM