在Spark JAVA中使用Map進行聚合

Question

我下面有一個帶有列的數據

dataset.show();
+---------+------+--------------+------+--------------------------+
|  Col1   | Col2 | Acceleration | Mass | Force(Acceleration*Mass) |
+---------+------+--------------+------+--------------------------+
| weight1 | ex1  |           10 |    5 |                       50 |
| weight1 | ex2  |            8 |    4 |                       32 |
| weight2 | ex1  |            5 |    3 |                       15 |
| weight2 | ex2  |            9 |    4 |                       36 |
+---------+------+--------------+------+--------------------------+

我使用aggMap如下。

aggMap.put("Acceleration","sum");
aggMap.put("Mass","sum");

對於Force，我希望始終將其計算為Acceleration*Mass ，如何在aggMap傳遞它（此處我沒有傳遞，因為我無法做）

在java中我做groupby為

dataset=dataset.select(col("Col1")).groupBy(col("Col1")).agg(aggMap);

結果我得到

+---------+-------------------+-----------+
|  Col1   | sum(Acceleration) | sum(Mass) |
+---------+-------------------+-----------+
| weight1 |                18 |         9 |
| weight2 |                14 |         7 |
+---------+-------------------+-----------+

但是這些列需要將sum(Acceleration) as Acceleration修改sum(Acceleration) as Acceleration ，將sum(Mass) as Mass ，我希望在聚合中計算Force列，並且應該將其列為Force

+---------+-------------------+-----------+-------+
|  Col1   | sum(Acceleration) | sum(Mass) | Force |
+---------+-------------------+-----------+-------+
| weight1 |                18 |         9 |   172 |
| weight2 |                14 |         7 |    98 |
+---------+-------------------+-----------+-------+

我怎樣才能做到相同？ 我之所以做Map是因為我是動態獲取列名（力，質量，加速..）的，並非每次都要計算。因此，我將檢查是否僅需要加速，質量或兩者都使用或全部使用。

Answer 1

我希望這就是你在找

  import org.apache.spark.sql.functions._
  val df = Seq(
    ("weight1", "ex1", 10, 5),
    ("weight1", "ex2", 8, 4),
    ("weight2", "ex1", 5, 3),
    ("weight2", "ex2", 9, 4)
  ).toDF("Col1", "Col2", "Acceleration", "Mass")

  val newDF = df.groupBy($"Col1")
    .agg(sum($"Acceleration").as("Acceleration"), sum($"Mass").as("Mass"))
    .withColumn("Force", $"Acceleration" * $"Mass")

  newDF.show(false)

輸出：

+-------+------------+----+-----+
|Col1   |Acceleration|Mass|Force|
+-------+------------+----+-----+
|weight2|14          |7   |98   |
|weight1|18          |9   |162  |
+-------+------------+----+-----+

在Spark JAVA中使用Map進行聚合

問題描述

1 個解決方案

解決方案1
0 2018-03-09 12:32:42

在Spark JAVA中使用Map進行聚合

問題描述

1 個解決方案

解決方案1 0 2018-03-09 12:32:42

解決方案1
0 2018-03-09 12:32:42