简体   繁体   中英

Spark Dataframe UDF - Schema for type Any is not supported

I am writing a Spark Scala UDF and facing "java.lang.UnsupportedOperationException: Schema for type Any is not supported"

import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.functions.udf

val aBP = udf((bG: String, pS: String, bP: String, iOne: String, iTwo: String) => {
  if (bG != "I") {"NA"}
  else if (pS == "D")
    {if (iTwo != null) iOne else "NA"}
  else if (pS == "U")
    {if (bP != null) bP else "NA"}
})

This is throwing error "java.lang.UnsupportedOperationException: Schema for type Any is not supported"

As disussed in this link your udf should return:

  • Primitives (Int, String, Boolean, ...)
  • Tuples of other supported types
  • Lists, Arrays, Maps of other supported types
  • Case Classes of other supported types

So if you add another else to your code, the compilation will succeed.

  val aBP = udf((bG: String, pS: String, bP: String, iOne: String, iTwo: String) => {
    if (bG != "I") {"NA"}
    else if (pS == "D") {
      if (iTwo != null) 
        iOne 
      else "NA"
    } else if (pS == "U") {
      if (bP != null) 
        bP 
      else 
        "NA"
    } else {
      ""
    }
  })

You could also redistribute your code using pattern matching:

val aBP = udf [String, String, String, String, String, String] {
  case (bG: String, _, _, _, _)                       if bG != "I" => "NA"
  case (_, pS: String, _, iOne: String, iTwo: String) if pS == "D" && iTwo.isEmpty => iOne
  case (_, pS: String, _, _, _)                       if pS == "D" => "NA"
  case (_, pS: String, bP: String, _, _)              if pS == "U" && bP.isEmpty => bP
  case (_, pS: String, _, _, _)                       if pS == "U" => "NA"
  case _ => ""
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM