It's not clear to me why the (non-serializable) implicit val gets serialized (exception thrown) here:
implicit val sc2:SparkContext = sc
val s1 = "asdf"
sc.parallelize(Array(1,2,3)).map(x1 => s1.map(x => 4))
but not when s1's value is in the scope of the closure:
implicit val sc2:SparkContext = sc
sc.parallelize(Array(1,2,3)).map(x1 => "asdf".map(x => 4))
My use case is obviously more complicated but I've boiled it down to this issue.
(The solution is to define the implicit val as @transient)
That depends on the scope where these lines reside :
Let's have a look at three options - in a method , in a class without s1
, and in a class with s1
:
object TTT {
val sc = new SparkContext("local", "test")
def main(args: Array[String]): Unit = {
new A().foo() // works
new B // works
new C // fails
}
class A {
def foo(): Unit = {
// no problem here: vars in a method can be serialized on their own
implicit val sc2: SparkContext = sc
val s1 = "asdf"
sc.parallelize(Array(1, 2, 3)).map(x1 => s1.map(x => 4)).count()
println("in A - works!")
}
}
class B {
// no problem here: B isn't serialized at all because there are no references to its members
implicit val sc2: SparkContext = sc
sc.parallelize(Array(1, 2, 3)).map(x1 => "asdf".map(x => 4)).count()
println("in B - works!")
}
class C extends Serializable {
implicit val sc2: SparkContext = sc
val s1 = "asdf" // to serialize s1, Spark will try serializing the YYY instance, which will serialize sc2
sc.parallelize(Array(1, 2, 3)).map(x1 => s1.map(x => 4)).count() // fails
}
}
Bottom line - implicit or not, this will fail if and only if s1
and sc2
are members of a class, which would mean the class would have to be serialized and will "drag" them both with it.
The scope is spark-shell REPL. In this case, sc2 (and any other implicit vals defined in the top-level REPL scope) is only serlalized when it's implicit AND another val from that scope used in the RDD operation. This makes because implicit values need to be made available globally and hence are automatically serialized to all worker nodes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.