簡體   English   中英

在Spark JAVA中使用Map進行聚合

[英]Using Map For Aggregation in Spark JAVA

我下面有一個帶有列的數據

dataset.show();
+---------+------+--------------+------+--------------------------+
|  Col1   | Col2 | Acceleration | Mass | Force(Acceleration*Mass) |
+---------+------+--------------+------+--------------------------+
| weight1 | ex1  |           10 |    5 |                       50 |
| weight1 | ex2  |            8 |    4 |                       32 |
| weight2 | ex1  |            5 |    3 |                       15 |
| weight2 | ex2  |            9 |    4 |                       36 |
+---------+------+--------------+------+--------------------------+

我使用aggMap如下。

aggMap.put("Acceleration","sum");
aggMap.put("Mass","sum");

對於Force,我希望始終將其計算為Acceleration*Mass ,如何在aggMap傳遞它(此處我沒有傳遞,因為我無法做)

在java中我做groupby為

dataset=dataset.select(col("Col1")).groupBy(col("Col1")).agg(aggMap);

結果我得到

+---------+-------------------+-----------+
|  Col1   | sum(Acceleration) | sum(Mass) |
+---------+-------------------+-----------+
| weight1 |                18 |         9 |
| weight2 |                14 |         7 |
+---------+-------------------+-----------+

但是這些列需要將sum(Acceleration) as Acceleration修改sum(Acceleration) as Acceleration ,將sum(Mass) as Mass ,我希望在聚合中計算Force列,並且應該將其列為Force

+---------+-------------------+-----------+-------+
|  Col1   | sum(Acceleration) | sum(Mass) | Force |
+---------+-------------------+-----------+-------+
| weight1 |                18 |         9 |   172 |
| weight2 |                14 |         7 |    98 |
+---------+-------------------+-----------+-------+

我怎樣才能做到相同? 我之所以做Map是因為我是動態獲取列名(力,質量,加速..)的,並非每次都要計算。因此,我將檢查是否僅需要加速,質量或兩者都使用或全部使用。

我希望這就是你在找

  import org.apache.spark.sql.functions._
  val df = Seq(
    ("weight1", "ex1", 10, 5),
    ("weight1", "ex2", 8, 4),
    ("weight2", "ex1", 5, 3),
    ("weight2", "ex2", 9, 4)
  ).toDF("Col1", "Col2", "Acceleration", "Mass")

  val newDF = df.groupBy($"Col1")
    .agg(sum($"Acceleration").as("Acceleration"), sum($"Mass").as("Mass"))
    .withColumn("Force", $"Acceleration" * $"Mass")

  newDF.show(false)

輸出:

+-------+------------+----+-----+
|Col1   |Acceleration|Mass|Force|
+-------+------------+----+-----+
|weight2|14          |7   |98   |
|weight1|18          |9   |162  |
+-------+------------+----+-----+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM