将元素合并到rdd中的数组

Question

how can I convert an RDD[(Int,Int)] to an RDD[Array[(Int,Int)]] where I combine elements with their key. 如何将元素与键组合的RDD[(Int,Int)]转换为RDD[Array[(Int,Int)]] 。

Lets say 可以说

(0,0),(1,0),(1,1),(0,1) （0,0），（1,0），（1,1），（0,1）

and I want it to be an Array arr1 = ((0,0),(1,0)) and an arr2 ((1,1),(0,1)) So the resulted rdd will have arr1,arr2 as arrays. 我希望它是一个数组arr1 =（（0,0），（1,0））和arr2（（1,1），（0,1））所以结果rdd将把arr1，arr2作为数组。

Answer 1

What you're basically trying to do is group an RDD[TupleN] by the i th element. 您基本上想做的是将RDD[TupleN]按第i个元素RDD[TupleN] 。 You can use 您可以使用

rdd.groupBy(_._1)

to create a 创建一个

Map[T, RDD[TupleN]]

where the key will be the i th element (ie, 0 or 1 in your example). 其中的键将是第i个元素（在您的示例中为0或1）。

Then you can map the values of this map to an array with mapValues(_.toArray) 然后，您可以使用mapValues(_.toArray)将此地图的值映射到一个数组

将元素合并到rdd中的数组

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-09-08 13:51:19

将元素合并到rdd中的数组

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-09-08 13:51:19

解决方案1
0 已采纳 2018-09-08 13:51:19