使用scala在火花中創建對RDD

Question

我是新手，所以我只需要創建兩個元素的RDD 。

Array1 = ((1,1)(1,2)(1,3),(2,1),(2,2),(2,3)

當我執行groupby鍵時，輸出為((1,(1,2,3)),(2,(1,2,3))

但是我需要輸出與鍵只有2個值對。 我不確定如何獲得它。

Expected Output = ((1,(1,2)),(1,(1,3)),(1(2,3),(2(1,2)),(2,(1,3)),(2,(2,3)))

這些值只能打印一次。 應該只有(1,2)而不是(2,1)或類似(2,3)而不是(3,4)

謝謝

Answer 1

您可以按以下方式獲得所需的結果：

// Prior to doing the `groupBy`, you have an RDD[(Int, Int)], x, containing:
//   (1,1),(1,2),(1,3),(2,1),(2,2),(2,3)
//
// Can simply map values as below. Result is a RDD[(Int, (Int, Int))].
val x: RDD[(Int, Int)] = sc.parallelize(Seq((1,1),(1,2),(1,3),(2,1),(2,2),(2,3))
val y: RDD[(Int, (Int, Int))] = x.map(t => (t._1, t)) // Map first value in pair tuple to the tuple
y.collect // Get result as an array
// res0: Array[(Int, (Int, Int))] = Array((1,(1,1)), (1,(1,2)), (1,(1,3)), (2,(2,1)), (2,(2,2)), (2,(2,3)))

也就是說，結果是將鍵（每個對的第一個值）與該對（作為元組）相關聯的對RDD 。 不要使用groupBy ，因為在這種情況下，它將無法滿足您的需求。

Answer 2

如果我正確理解了您的要求，則可以使用groupByKey和flatMapValues展平分組值的2-combinations ，如下所示：

val rdd = sc.parallelize(Seq(
  (1, 1), (1,  2), (1 ,3), (2, 1), (2, 2), (2, 3)
))

rdd.groupByKey.flatMapValues(_.toList.combinations(2)).
  map{ case (k, v) => (k, (v(0), v(1))) }.
  collect
// res1: Array[(Int, (Int, Int))] =
//   Array((1,(1,2)), (1,(1,3)), (1,(2,3)), (2,(1,2)), (2,(1,3)), (2,(2,3)))

使用scala在火花中創建對RDD

問題描述

2 個解決方案

解決方案1
3 已采納 2018-10-29 15:44:40

解決方案2
0 2018-10-29 16:38:52

使用scala在火花中創建對RDD

問題描述

2 個解決方案

解決方案1 3 已采納 2018-10-29 15:44:40

解決方案2 0 2018-10-29 16:38:52

解決方案1
3 已采納 2018-10-29 15:44:40

解決方案2
0 2018-10-29 16:38:52