简体   繁体   中英

UDF is not working to get file name in spark scala

This is how i am using UDF in spark data frame ..

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._

    import org.apache.spark.{ SparkConf, SparkContext }
    import java.sql.{Date, Timestamp}
    import org.apache.spark.sql.Row
    import org.apache.spark.sql.types._
    import org.apache.spark.sql.functions.udf

import org.apache.spark.sql.functions.input_file_name
import org.apache.spark.sql.functions.regexp_extract

spark.udf.register("get_cus_val", (filePath: String) => filePath.split("\\.")(4))


val df = sqlContext.read.format("csv").option("header", "true").option("delimiter", "|").option("inferSchema","true").load("s3://trfsdisu/SPARK/FinancialLineItem/MAIN")

val df1With_ = df.toDF(df.columns.map(_.replace(".", "_")): _*)
val column_to_keep = df1With_.columns.filter(v => (!v.contains("^") && !v.contains("!") && !v.contains("_c"))).toSeq
val df1result = df1With_.select(column_to_keep.head, column_to_keep.tail: _*)

df1result.withColumn("DataPartition", get_cus_val(input_file_name)).show()

But when i run this i get below error

<console>:545: error: not found: value get_cus_val
       df1result.withColumn("DataPartition", get_cus_val(input_file_name)).show() 

But i am able to get name of the file with full path if i do this ..

df1result.withColumn("DataPartition", input_file_name).show()

Any idea what am i missing ?

This doesn't work because you only register SQL function. You can try

val get_cus_val = spark.udf.register("get_cus_val", (filePath: String) => filePath.split("\\.")(4))

or

df1result.selectExpr("*", "get_cus_val(input_file_name) as DataPartition").show()

You can try this. It worked for me.

df.withColumn("file_name",callUDF("get_cus_val", input_file_name()))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM