簡體   English   中英

Scala Spark udf java.lang.UnsupportedOperationException

[英]Scala Spark udf java.lang.UnsupportedOperationException

我創建了此currying函數,以檢查endDateStr空值,代碼如下:( col x的類型為ArrayType [TimestampType]):

    def _getCountAll(dates: Seq[Timestamp]) = Option(dates).map(_.length)
    def _getCountFiltered(endDate: Timestamp)(dates: Seq[Timestamp]) = Option(dates).map(_.count(!_.after(endDate)))

    val getCountUDF = udf((endDateStr: Option[String]) => {
      endDateStr match {
        case None => _getCountAll _
        case Some(value) => _getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
      }
    })
    df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"))(col("x")))

但是我在執行時遇到了這個異常:

java.lang.UnsupportedOperationException:類型為Seq [java.sql.Timestamp] => Option [Int]的模式

誰能幫我弄清楚我的錯誤嗎?

您不能像這樣咖喱udf 如果您想要類似咖喱的行為,則應從外部函數返回udf

def getCountUDF(endDateStr: Option[String]) = udf {
  endDateStr match {
    case None => _getCountAll _
    case Some(value) => 
      _getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
  }
}

df.withColumn("distinct_dx_count", getCountUDF(Some("2009-09-10"))(col("x")))

否則,只需放棄currying並同時提供兩個參數:

val getCountUDF = udf((endDateStr: String, dates: Seq[Timestamp]) => 
  endDateStr match {
    case null => _getCountAll(dates)
    case _ => 
      _getCountFiltered(Timestamp.valueOf(endDateStr + " 23:59:59"))(dates)
  }
)

df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"), col("x")))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM