简体   繁体   中英

Generic parser class `Task not serializable`

I'm trying to construct class which receives a parser as an argument and uses this parser on each line. Below is a minimal example that you can paste into spark-shell .

import scala.util.{Success,Failure,Try}
import scala.reflect.ClassTag

class Reader[T : ClassTag](makeParser: () => (String => Try[T])) {

  def read(): Seq[T] = {

    val rdd = sc.parallelize(Seq("1","2","oops","4")) mapPartitions { lines =>

      // Since making a parser can be expensive, we want to make only one per partition.
      val parser: String => Try[T] = makeParser()

      lines flatMap { line =>
        parser(line) match {
          case Success(record) => Some(record)
          case Failure(_) => None
        }
      }
    }

    rdd.collect()
  }
}

class IntParser extends (String => Try[Int]) with Serializable {
  // There could be an expensive setup operation here...
  def apply(s: String): Try[Int] = Try { s.toInt }
}

However, when I try to run something like new Reader(() => new IntParser).read() (which type-checks just fine) I get the dreaded org.apache.spark.SparkException: Task not serializable error relating to closures.

Why is there an error and is there a way to re-engineer the above to avoid this (while keeping Reader generic)?

The problem is that makeParser is variable to class Reader and since you are using it inside rdd transformations spark will try to serialize the entire class Reader which is not serializable. So you will get task not serializable exception.

Adding Serializable to the class Reader will work with your code. But that is not a good practice since it will serialize entire class variables which might not be needed.

In general you could use the functions instead of method to avoid serialization issues. Because in scala functions are actually objects and it will be serialized.

Refer to this answer : Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM