简体   繁体   中英

Error while calling udf from within withColumn in Spark using Scala

I receive and error while calling udf from within withColumn in Spark using Scala. This error happens while building with SBT.

val hiveRDD = sqlContext.sql("select * from iac_trinity.ctg_us_clickstream")
hiveRDD.persist()

val trnEventDf = hiveRDD
  .withColumn("system_generated_id", getAuthId(hiveRDD("session_user_id")))
  .withColumn("application_assigned_event_id", hiveRDD("event_event_id"))


val getAuthId = udf((session_user_id:String) => {
    if (session_user_id != None){
        if (session_user_id != "NULL"){
            if (session_user_id != "null"){
            session_user_id
          }else "-1"
        }else "-1"
    }else "-1"
  }

)

I receive the error which is -

scala:58: No TypeTag available for String
val getAuthId = udf((session_user_id:String) => {

It compiles properly when instead of (session_user_id:String) I use (session_user_id:Any) but fails in runtime as Any is not recognized in Spark. Please let me know how to handle this.

您是否尝试过明确显示您的类型?

udf[String, String]((session_user_id:String)...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM