当 class 可序列化时，任务不可序列化

Question

I have the following class in Scala我在 Scala 中有以下 class

case class A
  (a:Int,b:Int) extends Serializable

when I try in Spark 2.4.当我在 Spark 2.4 中尝试时。 (via Databricks) （通过数据砖）

val textFile = sc.textFile(...) 
val df = textFile.map(_=>new A(2,3)).toDF()

(Edit: the error happens when I call df.collect() or register as table) （编辑：当我调用 df.collect() 或注册为表时发生错误）

I get org.apache.spark.SparkException: Task not serializable我得到org.apache.spark.SparkException: Task not serializable

what am I missing?我错过了什么？

I've tried adding encoders:我试过添加编码器：

implicit def AEncoder: org.apache.spark.sql.Encoder[A] = 
  org.apache.spark.sql.Encoders.kryo[A]

and和

import spark.implicits._
import org.apache.spark.sql.Encoders

edit: I have also tried:编辑：我也试过：

val df = textFile.map(_=>new A(2,3)).collect()

but no luck so far.但到目前为止还没有运气。

Answer 1

Sometimes this occurs intermittently on DataBricks.有时这会在 DataBricks 上间歇性发生。 Most annoying.最烦人。

Restart the cluster and try again, I have had this error sometimes and after restart it did not occur.重新启动集群并重试，我有时会遇到此错误，重新启动后它没有发生。

Answer 2

You can directly parse the file as Dataset with the case class you have.您可以使用您拥有的案例 class 直接将文件解析为Dataset 。

case class A(a:Int,b:Int) extends Serializable
val testRDD = spark.sparkContext.textFile("file:///test_file.csv")
val testDS = testRDD.map( line => line.split(",")).map(line_cols => A(line_cols(0).toInt, line_cols(1).toInt) ).toDS()

#res23: org.apache.spark.sql.Dataset[A] = [a: int, b: int]

当 class 可序列化时，任务不可序列化

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-07-09 23:37:35

解决方案2
0 2020-07-09 23:22:11

当 class 可序列化时，任务不可序列化

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-07-09 23:37:35

解决方案2 0 2020-07-09 23:22:11

解决方案1
1 已采纳 2020-07-09 23:37:35

解决方案2
0 2020-07-09 23:22:11