简体   繁体   中英

Is it possible to use Option with spark UDF

I'd like to use Option as input type for my functions.

udf((oa: Option[String], ob: Option[String])) => …

to handle null values in a more functional way.

Is there a way to do that ?

As far as I know it is not directly possible. Nothing stops you wrapping arguments with Options :

udf((oa: String, ob: String) => (Option(oa), Option(ob)) match {
  ...
})

using Dataset encoders:

val df = Seq(("a", None), ("b", Some("foo"))).toDF("oa", "ob")

df.as[(Option[String], Option[String])]

or adding some implicit conversions:

implicit def asOption[T](value: T) : Option[T] = Option(value)

def foo(oa: Option[String], ob: Option[String]) = {
  oa.flatMap(a => ob.map(b => s"$a - $b"))
}

def wrap[T, U, V](f: (Option[T], Option[U]) => V) = 
  (t: T, u: U) => f(Option(t), Option(u))

val foo_ = udf(wrap(foo))
df.select(foo_($"oa", $"ob"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM