简体   繁体   中英

Executing SQL Statements in spark-sql

I have a text file which is of the following format:

ID,Name,Rating
1,A,3
2,B,4
1,A,4

and I want to find the average rating for each ID in spark. This is the code I have so far but it keeps on giving me an error:

val Avg_data=spark.sql("select ID, AVG(Rating) from table")

ERROR: org.apache.sapk.sql.AnalysisException: grouping expressions sequence is empty, and 'table'.'ID' is not an aggregate function. Wrap '(avg(CAST(table.'Rating' AS BIGINT)) as 'avg(Rating)')' in windowing function(s).........

AVG() is an aggregation function so you would need a group by too

val Avg_data=spark.sql("select ID, AVG(Rating) as average from table group by ID")

You should have Avg_data as

+---+-------+
|ID |average|
+---+-------+
|1  |3.5    |
|2  |4.0    |
+---+-------+

You need to use group by clause along with avg.

1.DataFrame df

+---+----+------+
| ID|Name|Rating|
+---+----+------+
|  1|   A|     3|
|  2|   B|     4|
|  1|   A|     4|
+---+----+------+

2.Registering df as temp table and writing query with GROUP BY and AVG()

df.registerTempTable("table")

val avg_data=spark.sql("select ID,avg(Rating) from table group by ID")

avg_data.show

+---+-----------+
| ID|avg(Rating)|
+---+-----------+
|  1|        3.5|
|  2|        4.0|
+---+-----------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM