简体   繁体   中英

How to join two RDDs with mutually exclusive keys

Say I have two Spark RDDs with the following values

x = [(1, 3), (2, 4)]

and

y = [(3, 5), (4, 7)]

and I want to have

z = [(1, 3), (2, 4), (3, 5), (4, 7)]

How can I achieve this. I know you can use outerJoin followed by map to achieve this, but is there a more direct way for this.

rdd.union(otherRDD)为您提供问题中所期望的两个rdds的并集

x.union(y)

You can just use the + operator. In the context of lists, this is a concatenate operation.

>>> x = [(1, 3), (2, 4)]
>>> y = [(3, 5), (4, 7)]
>>> z = x + y
>>> z
[(1, 3), (2, 4), (3, 5), (4, 7)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM