[英]Null value Left Outer Join in Spark
I have two RDD and got the left join 我有两个RDD并加入了左联接
left join p1.leftOuterJoin(p2)
the result is like: 左连接
p1.leftOuterJoin(p2)
结果类似于:
Array[((String, String), (Int, Option[Int]))] =
Array(((1001-150329-002-0-04624,5060567),(1,None)), ((1002-141105-008-0-01934,10145500),(1,None)), ((1013-150324-009-0-02270,15750046),(1,None)), ((1005-150814-005-0-05885,5060656),(1,Some(1))), ((1009-150318-004-0-02537,5060583),(1,None)))
I want to replace all None
with 0
and get a clean data set like: 我想将所有
None
都替换为0
并得到一个干净的数据集,例如:
Array(((1001-150329-002-0-04624,5060567),0), ((1002-141105-008-0-01934,10145500),0), ((1013-150324-009-0-02270,15750046),0), ((1005-150814-005-0-05885,5060656),1)), ((1009-150318-004-0-02537,5060583),0))
Basically replace all (1,None)
with 0
and (1,Some(1))
with 1
基本上将所有
(1,None)
替换为0
,将(1,Some(1))
替换为1
If you are looking for the value of the Option
when Some
or 0
when None
, I would implement it with .map
and .getOrElse
: 如果您在“
Some
时寻找Option
的值,而在“ None
时寻找0
的值,则可以使用.map
和.getOrElse
来实现它:
a.map { case (k, (_, o)) => (k, o.getOrElse(0)) }
The result match the expected one: 结果与预期的相符:
Array(
((1001-150329-002-0-04624,5060567),0),
((1002-141105-008-0-01934,10145500),0),
((1013-150324-009-0-02270,15750046),0),
((1005-150814-005-0-05885,5060656),1)),
((1009-150318-004-0-02537,5060583),0))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.