[英]How to join two RDDs with mutually exclusive keys
Say I have two Spark RDDs with the following values 说我有两个具有以下值的Spark RDD
x = [(1, 3), (2, 4)]
and 和
y = [(3, 5), (4, 7)]
and I want to have 我想要
z = [(1, 3), (2, 4), (3, 5), (4, 7)]
How can I achieve this. 我该如何实现。 I know you can use outerJoin followed by map to achieve this, but is there a more direct way for this. 我知道您可以在地图之后使用externalJoin,但是可以使用更直接的方法。
rdd.union(otherRDD)
为您提供问题中所期望的两个rdds的并集
x.union(y)
You can just use the +
operator. 您可以只使用+
运算符。 In the context of lists, this is a concatenate operation. 在列表的上下文中,这是一个串联操作。
>>> x = [(1, 3), (2, 4)]
>>> y = [(3, 5), (4, 7)]
>>> z = x + y
>>> z
[(1, 3), (2, 4), (3, 5), (4, 7)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.