简体   繁体   中英

Spark “Task not serializable” when using field variables

My testing code is pretty simple and it's pretty much copied from the spark example, however,

import org.apache.spark.sql.SparkSession

import scala.util.Properties

class MyTest(sparkSession: SparkSession, properties: java.util.Properties) {

  val spark: SparkSession = sparkSession

  val sparkHome = Properties.envOrElse("SPARK_HOME", "/spark")
  val props = properties

  def run(): Unit = {
    val logFile = sparkHome + "/README.md"
    val logData = spark.read.textFile(logFile).cache()
    val numAs = logData.filter(line => line.contains(props.get("v1"))).count()
    val numBs = logData.filter(line => line.contains(props.get("v2"))).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")

  }
}

However, when I try to run it, it always reports Exception in thread "main" org.apache.spark.SparkException: Task not serializable and points to line val numAs = logData.filter(line => line.contains(props.get("v1"))).count()

Well, after I change it to

val v1 = props.get("v1")
val v2 = props.get("v2")

val numAs = logData.filter(line => line.contains(v1)).count()
val numBs = logData.filter(line => line.contains(v2)).count()

The exception is gone. I think the reason is spark complains about props can't be serialized. However, java.util.Properties actually implements java.io.Serializable

class Properties extends Hashtable<Object,Object> {

and Hashtable

public class Hashtable<K,V>
    extends Dictionary<K,V>
    implements Map<K,V>, Cloneable, java.io.Serializable {

Why do I still gets this exception?

The reason I need to do this is because my spark job has some command line parameters, and need to pass them into my spark job class instance. Any best practice for me to do this?

This line

line => line.contains(props.get("v1"))

implicitly captures this , which is MyTest , since it is the same as:

line => line.contains(this.props.get("v1"))

and MyTest is not serializable.

Define val props = properties inside run() method, not in class body.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM