Scala Spark udf java.lang.UnsupportedOperationException

Question

我创建了此currying函数，以检查endDateStr空值，代码如下：（ col x的类型为ArrayType [TimestampType]）：

    def _getCountAll(dates: Seq[Timestamp]) = Option(dates).map(_.length)
    def _getCountFiltered(endDate: Timestamp)(dates: Seq[Timestamp]) = Option(dates).map(_.count(!_.after(endDate)))

    val getCountUDF = udf((endDateStr: Option[String]) => {
      endDateStr match {
        case None => _getCountAll _
        case Some(value) => _getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
      }
    })
    df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"))(col("x")))

但是我在执行时遇到了这个异常：

java.lang.UnsupportedOperationException：类型为Seq [java.sql.Timestamp] => Option [Int]的模式

谁能帮我弄清楚我的错误吗？

Answer 1

您不能像这样咖喱udf 。 如果您想要类似咖喱的行为，则应从外部函数返回udf ：

def getCountUDF(endDateStr: Option[String]) = udf {
  endDateStr match {
    case None => _getCountAll _
    case Some(value) => 
      _getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
  }
}

df.withColumn("distinct_dx_count", getCountUDF(Some("2009-09-10"))(col("x")))

否则，只需放弃currying并同时提供两个参数：

val getCountUDF = udf((endDateStr: String, dates: Seq[Timestamp]) => 
  endDateStr match {
    case null => _getCountAll(dates)
    case _ => 
      _getCountFiltered(Timestamp.valueOf(endDateStr + " 23:59:59"))(dates)
  }
)

df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"), col("x")))

Scala Spark udf java.lang.UnsupportedOperationException

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-06-12 16:25:24

Scala Spark udf java.lang.UnsupportedOperationException

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-06-12 16:25:24

解决方案1
1 已采纳 2018-06-12 16:25:24