My testing code is pretty simple and it's pretty much copied from the spark example, however,
import org.apache.spark.sql.SparkSession
import scala.util.Properties
class MyTest(sparkSession: SparkSession, properties: java.util.Properties) {
val spark: SparkSession = sparkSession
val sparkHome = Properties.envOrElse("SPARK_HOME", "/spark")
val props = properties
def run(): Unit = {
val logFile = sparkHome + "/README.md"
val logData = spark.read.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains(props.get("v1"))).count()
val numBs = logData.filter(line => line.contains(props.get("v2"))).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")
}
}
However, when I try to run it, it always reports Exception in thread "main" org.apache.spark.SparkException: Task not serializable
and points to line val numAs = logData.filter(line => line.contains(props.get("v1"))).count()
Well, after I change it to
val v1 = props.get("v1")
val v2 = props.get("v2")
val numAs = logData.filter(line => line.contains(v1)).count()
val numBs = logData.filter(line => line.contains(v2)).count()
The exception is gone. I think the reason is spark complains about props
can't be serialized. However, java.util.Properties actually implements java.io.Serializable
class Properties extends Hashtable<Object,Object> {
and Hashtable
public class Hashtable<K,V>
extends Dictionary<K,V>
implements Map<K,V>, Cloneable, java.io.Serializable {
Why do I still gets this exception?
The reason I need to do this is because my spark job has some command line parameters, and need to pass them into my spark job class instance. Any best practice for me to do this?
This line
line => line.contains(props.get("v1"))
implicitly captures this
, which is MyTest
, since it is the same as:
line => line.contains(this.props.get("v1"))
and MyTest
is not serializable.
Define val props = properties
inside run()
method, not in class body.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.