I am trying to run a test spark/scala code to find employees who is having salary more than the avarage salary with a test data using below spark dataframe . But this is failing while executing :
Exception in thread "main" java.lang.UnsupportedOperationException: Cannot evaluate expression: avg(input[4, double, false])
What might be the correct syntax to achieve this ?
val dataDF20 = spark.createDataFrame(Seq(
(11, "emp1", 2, 45, 1000.0),
(12, "emp2", 1, 34, 2000.0),
(13, "emp3", 1, 33, 3245.0),
(14, "emp4", 1, 54, 4356.0),
(15, "emp5", 2, 76, 56789.0)
)).toDF("empid", "name", "deptid", "age", "sal")
val condition1 : Column = col("sal") > avg(col("sal"))
val d0 = dataDF20.filter(condition1)
println("------ d0.show()----", d0.show())
You can get this done in two steps:
val avgVal = dataDF20.select(avg($"sal")).take(1)(0)(0)
dataDF20.filter($"sal" > avgVal).show()
+-----+----+------+---+-------+
|empid|name|deptid|age| sal|
+-----+----+------+---+-------+
| 15|emp5| 2| 76|56789.0|
+-----+----+------+---+-------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.