简体   繁体   中英

Understanding the scala syntax about high order function

I am new to scala , I need to understand what is going on with below code snippet more specifically the sampleFunc val:

val sampleFunc: Seq[Row] => (Int, Long, Boolean, Row, String) = (mem: Seq[Row]) => {
                                //some code
                      (a1,b1,c1,d1,e1) // returning the value
                  }

spark.udf.register("sampleUDF", udf(sampleFunc,
  StructType(
    Seq(
      StructField(a, IntegerType),
      StructField(b, LongType),
      StructField(c, BooleanType),
      StructField(d, StructType(schema.fields)),
      StructField(e, StringType)
    )
  )))

Thanks.

Well, I see that in code snippet used Spark , but let's omit this and just take a look into sampleFunc . So everything quite simple: Next constitutions declares function itself:

val sampleFunc: Seq[Row] => (Int, Long, Boolean, Row, String) = ...

where Seq[Row] function argument type and (Int, Long, Boolean, Row, String) function result. In another words you create variable of type Function1[Seq[Row], (Int, Long, Boolean, Row, String)]

Then goes function body or implementation if you will

... = (mem: Seq[Row]) => {
                                //some code
                      (a1,b1,c1,d1,e1) // returning the value
                  }

where mem is the variable of declared function argument type, which should be the same type or extend the type used in function declaration type. (Function arguments are co-variant. Please, see for more example another good SO post: Why is Function[-A1,...,+B] not about allowing any supertypes as parameters? )

=> statement says that after it goes function body itself.

If you have more Java background or any another imperative language background, this also can be implemented in method manner:

def sampleFunc(mem: Seq[Row]): (Int, Long, Boolean, Row, String) =  {
  //some code
  (a1,b1,c1,d1,e1) // returning the value
}

Hope this helps!

//<-value name-> <-------------- value type-------------------->   <--------------implementation ----------------------->
//              <-arg type-> <-----result type --------------->   <-function argument->   <----func implementation ---->
val  sampleFunc:  Seq[Row]  => (Int, Long, Boolean, Row, String) = (mem: Seq[Row])      => { /*...*/; (a1,b1,c1,d1,e1) }


//same written differently:
//<-value name-> <-------------- value type------------------------------>   <-------implementation ----------->
val sampleFunc: Funtion1[Seq[Row], Tuple5[Int,Long, Boolean, Row, String]] = {mem => /*...*/; (a1,b1,c1,d1,e1)}
  • value name: nothing special here. Just another val in your code.
  • value type: it is long but pretty straightforward. It is Function1 type that takes Seq[Row] and returns Tuple5[Int, Long, Boolean, Row, String] . This is just scala's nicer syntax for it.
  • implementation: We are creating function that takes Seq[Row] using => syntax. Also nothing special here.

Maybe its easier for you to understand if you desuger the Tuple5 factory method invocations:

val sampleFunc: Seq[Row] => Tuple5[Int, Long, Boolean, Row, String] = 
    (mem: Seq[Row]) => Tuple5(a1,b1,c1,d1,e1)

and if you go further and replace the => in the type with Function1 you get:

Function1[Seq[Row], Tuple5[Int, Long, Boolean, Row, String]]

which means that sampleFunc is a function that takes an argument of type Seq[Row] and returns a Tuple5[Int, Long, Boolean, Row, String]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM