简体   繁体   English

如何使用互斥键来连接两个RDD

[英]How to join two RDDs with mutually exclusive keys

Say I have two Spark RDDs with the following values 说我有两个具有以下值的Spark RDD

x = [(1, 3), (2, 4)]

and

y = [(3, 5), (4, 7)]

and I want to have 我想要

z = [(1, 3), (2, 4), (3, 5), (4, 7)]

How can I achieve this. 我该如何实现。 I know you can use outerJoin followed by map to achieve this, but is there a more direct way for this. 我知道您可以在地图之后使用externalJoin,但是可以使用更直接的方法。

rdd.union(otherRDD)为您提供问题中所期望的两个rdds的并集

x.union(y)

You can just use the + operator. 您可以只使用+运算符。 In the context of lists, this is a concatenate operation. 在列表的上下文中,这是一个串联操作。

>>> x = [(1, 3), (2, 4)]
>>> y = [(3, 5), (4, 7)]
>>> z = x + y
>>> z
[(1, 3), (2, 4), (3, 5), (4, 7)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM