简体   繁体   中英

Spark scala UDF in DataFrames is not working

I have defined a function to convert Epoch time to CET and using that function after wrapping as UDF in Spark dataFrame. It is throwing error and not allowing me to use it. Please find below my code.

Function used to convert Epoch time to CET:

import java.text.SimpleDateFormat
import java.util.{Calendar, Date, TimeZone}
import java.util.concurrent.TimeUnit

def convertNanoEpochToDateTime(
                                  d: Long,
                                  f: String = "dd/MM/yyyy HH:mm:ss.SSS",
                                  z: String = "CET",
                                  msPrecision: Int = 9
                                ): String = {

    val sdf = new SimpleDateFormat(f)
    sdf.setTimeZone(TimeZone.getTimeZone(z))
    val date = new Date((d / Math.pow(10, 9).toLong) * 1000L)
    val stringTime = sdf.format(date)

    if (f.contains(".S")) {
      val lng = d.toString.length
      val milliSecondsStr = d.toString.substring(lng-9,lng)
      stringTime.substring(0, stringTime.lastIndexOf(".") + 1) + milliSecondsStr.substring(0,msPrecision)
    }
    else stringTime
}

val epochToDateTime = udf(convertNanoEpochToDateTime _)

Below given Spark DataFrame uses the above defined UDF for converting Epoch time to CET

val df2 = df1.select($"messageID",$"messageIndex",epochToDateTime($"messageTimestamp").as("messageTimestamp"))

I am getting the below shown error, when I run the code

UDF 错误

Any idea how am I supposed to proceed in this scenario?

The spark optimizer execution tells you that your function is not a Function1, that means that it is not a function that accepts one parameter. You have a function with four input parameters. And, although you may think that in Scala you are allowed to call that function with only one parameter because you have default values for the other three, it seems that Catalyst does not work in this way, so you will need to change the definition of your function to something like:

def convertNanoEpochToDateTime(
      f: String = "dd/MM/yyyy HH:mm:ss.SSS"
  )(z: String = "CET")(msPrecision: Int = 9)(d: Long): String

or

def convertNanoEpochToDateTime(f: String)(z: String)(msPrecision: Int)(d: Long): String 

and put the default values in the udf creation:

val epochToDateTime = udf(
  convertNanoEpochToDateTime("dd/MM/yyyy HH:mm:ss.SSS")("CET")(9) _
)

and try to define the SimpleDateFormat as a static transient value out of the function.

I found why the error is due to and resolved it. The problem is when I wrap the scala function as UDF, its expecting 4 parameters, but I was passing only one parameter. Now, I removed 3 parameters from the function and took those values inside the function itself, since they are constant values. Now in Spark Dataframe, I am calling the function with only 1 parameter and it works perfectly fine.

import java.text.SimpleDateFormat
import java.util.{Calendar, Date, TimeZone}
import java.util.concurrent.TimeUnit

def convertNanoEpochToDateTime(
                                  d: Long
                                ): String = {

    val f: String = "dd/MM/yyyy HH:mm:ss.SSS"
    val z: String = "CET"
    val msPrecision: Int = 9

    val sdf = new SimpleDateFormat(f)
    sdf.setTimeZone(TimeZone.getTimeZone(z))
    val date = new Date((d / Math.pow(10, 9).toLong) * 1000L)
    val stringTime = sdf.format(date)

    if (f.contains(".S")) {
      val lng = d.toString.length
      val milliSecondsStr = d.toString.substring(lng-9,lng)
      stringTime.substring(0, stringTime.lastIndexOf(".") + 1) + milliSecondsStr.substring(0,msPrecision)
    }
    else stringTime
}

val epochToDateTime = udf(convertNanoEpochToDateTime _)

import spark.implicits._

val df1 = List(1659962673251388155L,1659962673251388155L,1659962673251388155L,1659962673251388155L).toDF("epochTime")

val df2 = df1.select(epochToDateTime($"epochTime"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM