简体   繁体   中英

spark dataframe : finding employees who is having salary more than the average salary of the organization

I am trying to run a test spark/scala code to find employees who is having salary more than the avarage salary with a test data using below spark dataframe . But this is failing while executing :

Exception in thread "main" java.lang.UnsupportedOperationException: Cannot evaluate expression: avg(input[4, double, false])

What might be the correct syntax to achieve this ?

val dataDF20 = spark.createDataFrame(Seq(
      (11, "emp1",  2, 45, 1000.0),
      (12, "emp2", 1, 34, 2000.0),
      (13, "emp3", 1, 33, 3245.0),
      (14, "emp4", 1, 54, 4356.0),
      (15, "emp5", 2, 76, 56789.0)
    )).toDF("empid", "name", "deptid", "age", "sal")

    val condition1 : Column = col("sal") > avg(col("sal"))

    val d0 = dataDF20.filter(condition1)
    println("------ d0.show()----", d0.show())

You can get this done in two steps:

val avgVal = dataDF20.select(avg($"sal")).take(1)(0)(0)
dataDF20.filter($"sal" > avgVal).show()
+-----+----+------+---+-------+
|empid|name|deptid|age|    sal|
+-----+----+------+---+-------+
|   15|emp5|     2| 76|56789.0|
+-----+----+------+---+-------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM