简体   繁体   中英

confusion in understanding of passing parameter in scala

Two groupBy method in Spark's RDD are declared as:

def groupBy[K](f: T => K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]
def groupBy[K](f: T => K, numPartitions: Int)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]

I define a function f as:

def f(x: Int): Int = x % 2

I could just pass f to the first groupBy as rdd.groupBy(f) .

Why I could not just pass f to the second groupBy as rdd.groupBy(f, 10) ? I have to use rdd.groupBy(f(_), 10) or rdd.groupBy(x => f(x), 10) .

I define a function f as:

 def f(x: Int): Int = x % 2

That's not a function, that's a method . The two are fundamentally different:

  • Methods can be generic, functions can't.
  • Methods can have optional parameters with default arguments, functions can't.
  • Methods can have varargs, functions can't.
  • Methods can have implicit arguments, functions can't.

These are 4 restrictions that functions have compared to methods. Now, if they are so restricted, why do we use them? Well, there is one major advantage functions have:

  • Functions are objects, methods aren't (they belong to objects.)

This means: functions can be assigned to val s/ var s, they can be passed as arguments to functions, methods and constructors, they can be returned from functions and methods. Methods cannot do any of that: Scala is an object-oriented language, all entities that can be manipulated by the program are objects … and methods aren't.

So, why does

rdd.groupBy(f)

work?

Well, you can convert a method into a partially applied function (here "partially applied" means "partially applied to this ", not to a subset of the arguments) via η-expansion:

val fn = f _
// => fn: Int => Int = <function1>

Here, as is so often the case in Scala, the underscore is used as a placeholder (in this case for the yet to be supplied arguments). We have fixed the this of the method and left the arguments open, and created a function corresponding to that method.

In some cases, Scala will know that you want to perform η-expansion even without explicitly providing the underscore. That's why

rdd.groupBy(f)

works. This is called implicit η-expansion (§6.26.2 case 3 of the Scala Language Specification) and, because of ambiguities, works only in a limited amount of cases.

However, after having explained all this, I must admit, that I don't see why your second example doesn't work. According to my reading of the spec, it should.

IOW: the fundamental problem you seem to be having is that you confuse functions and methods, but in this particular case , it should actually work (at least according to my interpretation of the spec, although clearly not according to the compiler writers' interpretation).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM