在 Spark scakla 中对多个列使用 groupBy 和 agg

Question

I have a DataFrame with 4 columns.我有一个 4 列的 DataFrame。 I want to apply GroupBy on the basis of 2 columns and want to collect other columns as list.我想基于 2 列应用GroupBy并希望将其他列收集为列表。 Example:- I have a DF like this示例：- 我有一个像这样的 DF

+---+-------+--------+-----------+
|id |fName  |lName   |dob        |
+---+-------+--------+-----------+
|1  |Akash  |Sethi   |23-05-1995 |
|2  |Kunal  |Kapoor  |14-10-1992 |
|3  |Rishabh|Verma   |11-08-1994 |
|2  |Sonu   |Mehrotra|14-10-1992 |
+---+-------+--------+-----------+

and I want my output like this:-我想要我的 output 像这样：-

+---+-----------+-------+--------+--------------------+
|id |dob        |fname           |lName               |
+---+-----------+-------+--------+--------------------+
|1  |23-05-1995 |[Akash]         |[Sethi]             |
|2  |14-10-1992 |[Kunal, Sonu]   |[Kapoor, Mehrotra]  |
|3  |11-08-1994 |[Rishabh]       |[Verma]             |
+---+-----------+-------+--------+--------------------+

Answer 1

You can do something like this using agg你可以用 agg 做这样的事情

df.groupBy("id","dob").agg(collect_list(col("fname")),collect_list(col("lName")))

在 Spark scakla 中对多个列使用 groupBy 和 agg

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-08-13 08:12:07

在 Spark scakla 中对多个列使用 groupBy 和 agg

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-08-13 08:12:07

解决方案1
2 已采纳 2020-08-13 08:12:07