简体   繁体   中英

How to apply udf on a dataframe and on a column in scala?

I am beginner to scala. I tried scala REPL window in intellij . I have a sample df and trying to test udf function not builtin for understanding.

df :

scala> import org.apache.spark.sql.SparkSession
 val spark: SparkSession = SparkSession.builder.appName("elephant").config("spark.master", "local[*]").getOrCreate()
 val df = spark.createDataFrame(Seq(("A",1),("B",2),("C",3))).toDF("Letter", "Number")
 df.show()

output :

|Letter|Number|
+------+------+
|     A|     1|
|     B|     2|
|     C|     3|
+------+------+

udf for dataframe filter:

scala> def kill_4(n: String) : Boolean = {
     | if (n =="A"){ true} else {false}} // please validate if its correct ???

I tried

df.withColumn("new_col", kill_4(col("Letter"))).show() // please tell correct way???

error error: type mismatch

Second: I tried on a col list:

scala> val lst = df.select("Letter").collect().map(_(0)).toList
val res56: List[Any] = List(A, B, C)

lst.filter(kill_4) //please suggest ???

error mismatch type

You can register udf and use it in code as follows:

import org.apache.spark.sql.functions.col

def kill_4(n: String) : Boolean = {
     if (n =="A"){ true } else {false}
}
 
val kill_udf = udf((x: String) => kill_4(x))

df.select(col("Letter"),col("Number")
    kill_udf(col("Letter")).as("Kill_4") ).show(false)

Please look at the databricks documentation on scala user defined funcitons.

https://docs.databricks.com/spark/latest/spark-sql/udf-scala.html

在此处输入图像描述

You do not need the spark session to create a dataframe. I removed that code.

在此处输入图像描述

Your function had a couple bugs. Since it is very small, I created a inline one. The udf() call allows the function to be used with dataframes. The call to register allows it to be used with Spark SQL.

A quick SQL statement shows the function works.

在此处输入图像描述

Last but not least, we need the udf() and col() functions for the last statement to work.

In short, these three snippets solve your problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM