![](/img/trans.png)
[英]Spark Scala UDF : java.lang.UnsupportedOperationException: Schema for type Any is not supported
[英]Scala Spark udf java.lang.UnsupportedOperationException
我创建了此currying函数,以检查endDateStr
空值,代码如下:( col x的类型为ArrayType [TimestampType]):
def _getCountAll(dates: Seq[Timestamp]) = Option(dates).map(_.length)
def _getCountFiltered(endDate: Timestamp)(dates: Seq[Timestamp]) = Option(dates).map(_.count(!_.after(endDate)))
val getCountUDF = udf((endDateStr: Option[String]) => {
endDateStr match {
case None => _getCountAll _
case Some(value) => _getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
}
})
df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"))(col("x")))
但是我在执行时遇到了这个异常:
java.lang.UnsupportedOperationException:类型为Seq [java.sql.Timestamp] => Option [Int]的模式
谁能帮我弄清楚我的错误吗?
您不能像这样咖喱udf
。 如果您想要类似咖喱的行为,则应从外部函数返回udf
:
def getCountUDF(endDateStr: Option[String]) = udf {
endDateStr match {
case None => _getCountAll _
case Some(value) =>
_getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
}
}
df.withColumn("distinct_dx_count", getCountUDF(Some("2009-09-10"))(col("x")))
否则,只需放弃currying并同时提供两个参数:
val getCountUDF = udf((endDateStr: String, dates: Seq[Timestamp]) =>
endDateStr match {
case null => _getCountAll(dates)
case _ =>
_getCountFiltered(Timestamp.valueOf(endDateStr + " 23:59:59"))(dates)
}
)
df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"), col("x")))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.