I am new to scala , I need to understand what is going on with below code snippet more specifically the sampleFunc val:
val sampleFunc: Seq[Row] => (Int, Long, Boolean, Row, String) = (mem: Seq[Row]) => {
//some code
(a1,b1,c1,d1,e1) // returning the value
}
spark.udf.register("sampleUDF", udf(sampleFunc,
StructType(
Seq(
StructField(a, IntegerType),
StructField(b, LongType),
StructField(c, BooleanType),
StructField(d, StructType(schema.fields)),
StructField(e, StringType)
)
)))
Thanks.
Well, I see that in code snippet used Spark
, but let's omit this and just take a look into sampleFunc
. So everything quite simple: Next constitutions declares function itself:
val sampleFunc: Seq[Row] => (Int, Long, Boolean, Row, String) = ...
where Seq[Row]
function argument type and (Int, Long, Boolean, Row, String)
function result. In another words you create variable of type Function1[Seq[Row], (Int, Long, Boolean, Row, String)]
Then goes function body or implementation if you will
... = (mem: Seq[Row]) => {
//some code
(a1,b1,c1,d1,e1) // returning the value
}
where mem
is the variable of declared function argument type, which should be the same type or extend the type used in function declaration type. (Function arguments are co-variant. Please, see for more example another good SO post: Why is Function[-A1,...,+B] not about allowing any supertypes as parameters? )
=>
statement says that after it goes function body itself.
If you have more Java
background or any another imperative language background, this also can be implemented in method manner:
def sampleFunc(mem: Seq[Row]): (Int, Long, Boolean, Row, String) = {
//some code
(a1,b1,c1,d1,e1) // returning the value
}
Hope this helps!
//<-value name-> <-------------- value type--------------------> <--------------implementation ----------------------->
// <-arg type-> <-----result type ---------------> <-function argument-> <----func implementation ---->
val sampleFunc: Seq[Row] => (Int, Long, Boolean, Row, String) = (mem: Seq[Row]) => { /*...*/; (a1,b1,c1,d1,e1) }
//same written differently:
//<-value name-> <-------------- value type------------------------------> <-------implementation ----------->
val sampleFunc: Funtion1[Seq[Row], Tuple5[Int,Long, Boolean, Row, String]] = {mem => /*...*/; (a1,b1,c1,d1,e1)}
val
in your code.Function1
type that takes Seq[Row]
and returns Tuple5[Int, Long, Boolean, Row, String]
. This is just scala's nicer syntax for it.Seq[Row]
using =>
syntax. Also nothing special here.Maybe its easier for you to understand if you desuger the Tuple5 factory method invocations:
val sampleFunc: Seq[Row] => Tuple5[Int, Long, Boolean, Row, String] =
(mem: Seq[Row]) => Tuple5(a1,b1,c1,d1,e1)
and if you go further and replace the =>
in the type with Function1
you get:
Function1[Seq[Row], Tuple5[Int, Long, Boolean, Row, String]]
which means that sampleFunc
is a function that takes an argument of type Seq[Row]
and returns a Tuple5[Int, Long, Boolean, Row, String]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.