[英]Spark (JAVA) - dataframe groupBy with multiple aggregations?
I'm trying to write a groupBy on Spark with JAVA.我正在尝试使用 JAVA 在 Spark 上编写 groupBy。 In SQL this would look like
在 SQL 中,这看起来像
SELECT id, count(id) as count, max(date) maxdate
FROM table
GROUP BY id;
But what is the Spark/JAVA style equivalent of this query?但是这个查询的 Spark/JAVA 风格是什么? Let's say the variable
table
is a dataframe, to see the relation to the SQL query.假设变量
table
是一个数据框,以查看与 SQL 查询的关系。 I'm thinking something like:我在想这样的事情:
table = table.select(table.col("id"), (table.col("id").count()).as("count"), (table.col("date").max()).as("maxdate")).groupby("id")
Which is obviously incorrect, since you can't use aggregate functions like .count
or .max
on columns, only dataframes.这显然是不正确的,因为您不能在列上使用
.count
或.max
等聚合函数,只能使用数据帧。 So how is this done in Spark JAVA?那么这在 Spark JAVA 中是如何完成的呢?
Thank you!谢谢!
You could do this with org.apache.spark.sql.functions
:你可以用
org.apache.spark.sql.functions
做到这一点:
import org.apache.spark.sql.functions;
table.groupBy("id").agg(
functions.count("id").as("count"),
functions.max("date").as("maxdate")
).show();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.