reduceByKey RDD 火花 scala

Question

我有RDD[(String, String, Int)]並且想使用 reduceByKey 來獲得如下所示的結果。 我不希望它轉換為 DF 然后執行 groupBy 操作來獲得結果。 Int是常數，其值始終為 1。

是否可以在這里使用 reduceByKey 來獲得結果？ 以表格形式呈現以方便閱讀

問題

細繩	細繩	詮釋
第一的	蘋果	1
第二	香蕉	1
第一的	花	1
第三	樹	1

結果

細繩	細繩	詮釋
第一的	蘋果，花	2
第二	香蕉	1
第三	樹	1

Answer 1

如果您有 Tuple3，則不能使用reduceByKey ，但如果您有RDD[(String, String)]則可以使用reduceByKey 。

此外，一旦你groupBy ，你就可以應用reduceByKey ，但由於鍵是唯一的，調用reduceByKey是沒有意義的，因此我們使用map到 map 一對一的值。

所以，假設df是你的主表，那么這段代碼：

val rest = df.groupBy(x => x._1).map(x => {
  val key = x._1 // ex: First
  val groupedData = x._2 // ex: [(First, Apple, 1), (First, Flower, 1)]

  // ex: [(First, Apple, 1), (First, Flower, 1)] => [Apple, Flower] => Apple, Flower
  val concat = groupedData.map(d => d._2).mkString(",")
  // ex: [(First, Apple, 1), (First, Flower, 1)] => [1, 1] => 2
  val sum = groupedData.map(d => d._3).sum

  (key, concat, sum) // return a tuple3 again, same format
})

返回此結果：

(Second,Banana,1)
(First,Apple,Flower,2)
(Third,Tree,1)

祝你好運！

reduceByKey RDD 火花 scala

問題描述

1 個解決方案

解決方案1
0 2022-08-08 22:46:15

reduceByKey RDD 火花 scala

問題描述

1 個解決方案

解決方案1 0 2022-08-08 22:46:15

解決方案1
0 2022-08-08 22:46:15