[英]Using Map For Aggregation in Spark JAVA
我下面有一個帶有列的數據
dataset.show();
+---------+------+--------------+------+--------------------------+
| Col1 | Col2 | Acceleration | Mass | Force(Acceleration*Mass) |
+---------+------+--------------+------+--------------------------+
| weight1 | ex1 | 10 | 5 | 50 |
| weight1 | ex2 | 8 | 4 | 32 |
| weight2 | ex1 | 5 | 3 | 15 |
| weight2 | ex2 | 9 | 4 | 36 |
+---------+------+--------------+------+--------------------------+
我使用aggMap如下。
aggMap.put("Acceleration","sum");
aggMap.put("Mass","sum");
對於Force,我希望始終將其計算為Acceleration*Mass
,如何在aggMap
傳遞它(此處我沒有傳遞,因為我無法做)
在java中我做groupby為
dataset=dataset.select(col("Col1")).groupBy(col("Col1")).agg(aggMap);
結果我得到
+---------+-------------------+-----------+
| Col1 | sum(Acceleration) | sum(Mass) |
+---------+-------------------+-----------+
| weight1 | 18 | 9 |
| weight2 | 14 | 7 |
+---------+-------------------+-----------+
但是這些列需要將sum(Acceleration) as Acceleration
修改sum(Acceleration) as Acceleration
,將sum(Mass) as Mass
,我希望在聚合中計算Force列,並且應該將其列為Force
+---------+-------------------+-----------+-------+
| Col1 | sum(Acceleration) | sum(Mass) | Force |
+---------+-------------------+-----------+-------+
| weight1 | 18 | 9 | 172 |
| weight2 | 14 | 7 | 98 |
+---------+-------------------+-----------+-------+
我怎樣才能做到相同? 我之所以做Map是因為我是動態獲取列名(力,質量,加速..)的,並非每次都要計算。因此,我將檢查是否僅需要加速,質量或兩者都使用或全部使用。
我希望這就是你在找
import org.apache.spark.sql.functions._
val df = Seq(
("weight1", "ex1", 10, 5),
("weight1", "ex2", 8, 4),
("weight2", "ex1", 5, 3),
("weight2", "ex2", 9, 4)
).toDF("Col1", "Col2", "Acceleration", "Mass")
val newDF = df.groupBy($"Col1")
.agg(sum($"Acceleration").as("Acceleration"), sum($"Mass").as("Mass"))
.withColumn("Force", $"Acceleration" * $"Mass")
newDF.show(false)
輸出:
+-------+------------+----+-----+
|Col1 |Acceleration|Mass|Force|
+-------+------------+----+-----+
|weight2|14 |7 |98 |
|weight1|18 |9 |162 |
+-------+------------+----+-----+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.