Two groupBy
method in Spark's RDD are declared as:
def groupBy[K](f: T => K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]
def groupBy[K](f: T => K, numPartitions: Int)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]
I define a function f as:
def f(x: Int): Int = x % 2
I could just pass f
to the first groupBy
as rdd.groupBy(f)
.
Why I could not just pass f
to the second groupBy
as rdd.groupBy(f, 10)
? I have to use rdd.groupBy(f(_), 10)
or rdd.groupBy(x => f(x), 10)
.
I define a function f as:
def f(x: Int): Int = x % 2
That's not a function, that's a method . The two are fundamentally different:
These are 4 restrictions that functions have compared to methods. Now, if they are so restricted, why do we use them? Well, there is one major advantage functions have:
This means: functions can be assigned to val
s/ var
s, they can be passed as arguments to functions, methods and constructors, they can be returned from functions and methods. Methods cannot do any of that: Scala is an object-oriented language, all entities that can be manipulated by the program are objects … and methods aren't.
So, why does
rdd.groupBy(f)
work?
Well, you can convert a method into a partially applied function (here "partially applied" means "partially applied to this
", not to a subset of the arguments) via η-expansion:
val fn = f _
// => fn: Int => Int = <function1>
Here, as is so often the case in Scala, the underscore is used as a placeholder (in this case for the yet to be supplied arguments). We have fixed the this
of the method and left the arguments open, and created a function corresponding to that method.
In some cases, Scala will know that you want to perform η-expansion even without explicitly providing the underscore. That's why
rdd.groupBy(f)
works. This is called implicit η-expansion (§6.26.2 case 3 of the Scala Language Specification) and, because of ambiguities, works only in a limited amount of cases.
However, after having explained all this, I must admit, that I don't see why your second example doesn't work. According to my reading of the spec, it should.
IOW: the fundamental problem you seem to be having is that you confuse functions and methods, but in this particular case , it should actually work (at least according to my interpretation of the spec, although clearly not according to the compiler writers' interpretation).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.