I have created this currying function to check for null values for endDateStr
inside an udf, the code is as follows:(Type of col x is ArrayType[TimestampType]):
def _getCountAll(dates: Seq[Timestamp]) = Option(dates).map(_.length)
def _getCountFiltered(endDate: Timestamp)(dates: Seq[Timestamp]) = Option(dates).map(_.count(!_.after(endDate)))
val getCountUDF = udf((endDateStr: Option[String]) => {
endDateStr match {
case None => _getCountAll _
case Some(value) => _getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
}
})
df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"))(col("x")))
But I am getting this exception while executing:
java.lang.UnsupportedOperationException: Schema for type Seq[java.sql.Timestamp] => Option[Int] is not supported
Can anyone please help me to figure out my mistake?
You cannot curry udf
like this. If you want curry-like behavior you should return udf
from the outer function:
def getCountUDF(endDateStr: Option[String]) = udf {
endDateStr match {
case None => _getCountAll _
case Some(value) =>
_getCountFiltered(Timestamp.valueOf(value + " 23:59:59")) _
}
}
df.withColumn("distinct_dx_count", getCountUDF(Some("2009-09-10"))(col("x")))
otherwise just drop currying and provide both arguments at the same time:
val getCountUDF = udf((endDateStr: String, dates: Seq[Timestamp]) =>
endDateStr match {
case null => _getCountAll(dates)
case _ =>
_getCountFiltered(Timestamp.valueOf(endDateStr + " 23:59:59"))(dates)
}
)
df.withColumn("distinct_dx_count", getCountUDF(lit("2009-09-10"), col("x")))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.