简体   繁体   English

当 class 可序列化时,任务不可序列化

[英]Task not serializable when class is serializable

I have the following class in Scala我在 Scala 中有以下 class

case class A
  (a:Int,b:Int) extends Serializable

when I try in Spark 2.4.当我在 Spark 2.4 中尝试时。 (via Databricks) (通过数据砖)

val textFile = sc.textFile(...) 
val df = textFile.map(_=>new A(2,3)).toDF()

(Edit: the error happens when I call df.collect() or register as table) (编辑:当我调用 df.collect() 或注册为表时发生错误)

I get org.apache.spark.SparkException: Task not serializable我得到org.apache.spark.SparkException: Task not serializable

what am I missing?我错过了什么?

I've tried adding encoders:我试过添加编码器:

implicit def AEncoder: org.apache.spark.sql.Encoder[A] = 
  org.apache.spark.sql.Encoders.kryo[A]

and

import spark.implicits._
import org.apache.spark.sql.Encoders

edit: I have also tried:编辑:我也试过:

val df = textFile.map(_=>new A(2,3)).collect()

but no luck so far.但到目前为止还没有运气。

Sometimes this occurs intermittently on DataBricks.有时这会在 DataBricks 上间歇性发生。 Most annoying.最烦人。

Restart the cluster and try again, I have had this error sometimes and after restart it did not occur.重新启动集群并重试,我有时会遇到此错误,重新启动后它没有发生。

You can directly parse the file as Dataset with the case class you have.您可以使用您拥有的案例 class 直接将文件解析为Dataset

case class A(a:Int,b:Int) extends Serializable
val testRDD = spark.sparkContext.textFile("file:///test_file.csv")
val testDS = testRDD.map( line => line.split(",")).map(line_cols => A(line_cols(0).toInt, line_cols(1).toInt) ).toDS()

#res23: org.apache.spark.sql.Dataset[A] = [a: int, b: int]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM