[英]Use groupBy and agg for more than one columns in Spark scakla
I have a DataFrame with 4 columns.我有一个 4 列的 DataFrame。 I want to apply
GroupBy
on the basis of 2 columns and want to collect other columns as list.我想基于 2 列应用
GroupBy
并希望将其他列收集为列表。 Example:- I have a DF like this示例:- 我有一个像这样的 DF
+---+-------+--------+-----------+
|id |fName |lName |dob |
+---+-------+--------+-----------+
|1 |Akash |Sethi |23-05-1995 |
|2 |Kunal |Kapoor |14-10-1992 |
|3 |Rishabh|Verma |11-08-1994 |
|2 |Sonu |Mehrotra|14-10-1992 |
+---+-------+--------+-----------+
and I want my output like this:-我想要我的 output 像这样:-
+---+-----------+-------+--------+--------------------+
|id |dob |fname |lName |
+---+-----------+-------+--------+--------------------+
|1 |23-05-1995 |[Akash] |[Sethi] |
|2 |14-10-1992 |[Kunal, Sonu] |[Kapoor, Mehrotra] |
|3 |11-08-1994 |[Rishabh] |[Verma] |
+---+-----------+-------+--------+--------------------+
You can do something like this using agg你可以用 agg 做这样的事情
df.groupBy("id","dob").agg(collect_list(col("fname")),collect_list(col("lName")))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.