[英]Executing SQL Statements in spark-sql
I have a text file which is of the following format: 我有一个以下格式的文本文件:
ID,Name,Rating
1,A,3
2,B,4
1,A,4
and I want to find the average rating for each ID in spark. 我想找到spark中每个ID的平均评分。 This is the code I have so far but it keeps on giving me an error: 这是我到目前为止的代码,但是一直在给我一个错误:
val Avg_data=spark.sql("select ID, AVG(Rating) from table")
ERROR: org.apache.sapk.sql.AnalysisException: grouping expressions sequence is empty, and 'table'.'ID' is not an aggregate function. 错误:org.apache.sapk.sql.AnalysisException:分组表达式序列为空,并且'table'。'ID'不是聚合函数。 Wrap '(avg(CAST(table.'Rating' AS BIGINT)) as 'avg(Rating)')' in windowing function(s)......... 在窗口函数中将'(avg(CAST(table.'Rating'AS BIGINT))包装为'avg(Rating)')'.........
AVG()
is an aggregation function so you would need a group by too AVG()
是一个聚合函数,因此您也需要一个分组
val Avg_data=spark.sql("select ID, AVG(Rating) as average from table group by ID")
You should have Avg_data
as 您应该将Avg_data
作为
+---+-------+
|ID |average|
+---+-------+
|1 |3.5 |
|2 |4.0 |
+---+-------+
You need to use group by clause along with avg. 您需要将group by子句与avg一起使用。
1.DataFrame df 1.DataFrame df
+---+----+------+
| ID|Name|Rating|
+---+----+------+
| 1| A| 3|
| 2| B| 4|
| 1| A| 4|
+---+----+------+
2.Registering df as temp table and writing query with GROUP BY and AVG() 2.将df注册为临时表并使用GROUP BY和AVG()编写查询
df.registerTempTable("table")
val avg_data=spark.sql("select ID,avg(Rating) from table group by ID")
avg_data.show
+---+-----------+
| ID|avg(Rating)|
+---+-----------+
| 1| 3.5|
| 2| 4.0|
+---+-----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.