[英]Task not serializable when class is serializable
I have the following class in Scala我在 Scala 中有以下 class
case class A
(a:Int,b:Int) extends Serializable
when I try in Spark 2.4.当我在 Spark 2.4 中尝试时。 (via Databricks)
(通过数据砖)
val textFile = sc.textFile(...)
val df = textFile.map(_=>new A(2,3)).toDF()
(Edit: the error happens when I call df.collect() or register as table) (编辑:当我调用 df.collect() 或注册为表时发生错误)
I get org.apache.spark.SparkException: Task not serializable
我得到
org.apache.spark.SparkException: Task not serializable
what am I missing?我错过了什么?
I've tried adding encoders:我试过添加编码器:
implicit def AEncoder: org.apache.spark.sql.Encoder[A] =
org.apache.spark.sql.Encoders.kryo[A]
and和
import spark.implicits._
import org.apache.spark.sql.Encoders
edit: I have also tried:编辑:我也试过:
val df = textFile.map(_=>new A(2,3)).collect()
but no luck so far.但到目前为止还没有运气。
Sometimes this occurs intermittently on DataBricks.有时这会在 DataBricks 上间歇性发生。 Most annoying.
最烦人。
Restart the cluster and try again, I have had this error sometimes and after restart it did not occur.重新启动集群并重试,我有时会遇到此错误,重新启动后它没有发生。
You can directly parse the file as Dataset
with the case class you have.您可以使用您拥有的案例 class 直接将文件解析为
Dataset
。
case class A(a:Int,b:Int) extends Serializable
val testRDD = spark.sparkContext.textFile("file:///test_file.csv")
val testDS = testRDD.map( line => line.split(",")).map(line_cols => A(line_cols(0).toInt, line_cols(1).toInt) ).toDS()
#res23: org.apache.spark.sql.Dataset[A] = [a: int, b: int]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.