简体   繁体   English

Spark Kryo 序列化失败

[英]Spark Kryo Serialization fails

I have a piece of Spark code that worked on Spark 1.3 but fails when I move it to Spark 1.5.2 (cluster upgrade out of my control).我有一段适用于 Spark 1.3 的 Spark 代码,但是当我将其移动到 Spark 1.5.2 时失败(集群升级不受我的控制)。 The failure is as follows:失败如下:

Caused by: java.io.NotSerializableException: com.location.model.Profile
Serialization stack:
    - object not serializable (class: com.location.model.Profile, value: com.location.model.Profile@596032b0)
    - field (class: org.apache.spark.rdd.PairRDDFunctions$$anonfun$aggregateByKey$1, name: zeroValue$3, type: class java.lang.Object)
    - object (class org.apache.spark.rdd.PairRDDFunctions$$anonfun$aggregateByKey$1, <function0>)
    - field (class: org.apache.spark.rdd.PairRDDFunctions$$anonfun$aggregateByKey$1$$anonfun$1, name: $outer, type: class org.apache.spark.rdd.PairRDDFunctions$$anonfun$aggregateByKey$1)
    - object (class org.apache.spark.rdd.PairRDDFunctions$$anonfun$aggregateByKey$1$$anonfun$1, <function0>)
    - field (class: org.apache.spark.rdd.PairRDDFunctions$$anonfun$aggregateByKey$1$$anonfun$apply$10, name: createZero$1, type: interface scala.Function0)
    - object (class org.apache.spark.rdd.PairRDDFunctions$$anonfun$aggregateByKey$1$$anonfun$apply$10, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)

The interesting piece is that the class at hand, Profile , is declared as class Profile() extends KryoSerializable and overrides the read/write methods for that interface.有趣的是,手头的类Profile被声明为class Profile() extends KryoSerializable并覆盖了该接口的读/写方法。

I've also set this config to Spark-submit: "--conf" -> "'spark.serializer=org.apache.spark.serializer.KryoSerializer'" and registered the Profile class with Kryo by doing conf.registerKryoClasses(Array( classOf[Profile], ...我还设置这个配置的火花提交: "--conf" -> "'spark.serializer=org.apache.spark.serializer.KryoSerializer'"登记的Profile做与KRYO类conf.registerKryoClasses(Array( classOf[Profile], ...

So all according to instruction in the Spark Tunning guide , and it worked nicely before.因此,一切都按照Spark Tunning 指南中的说明进行,并且之前运行良好。 Note that the exception shows a JavaSerializerInstance being used by the ClosureCleaner , and indeed if I add extends Serializable to the Profile class it works.请注意,异常显示JavaSerializerInstance正在使用ClosureCleaner ,实际上如果我将extends Serializable添加到Profile类,它就可以工作。 But I'm not sure why it is using that serializer nor why should I be compatible with Java Serialization if I'm specifically asking for Kryo.但是我不确定它为什么要使用该序列化程序,也不确定如果我特别要求 Kryo,我为什么要与 Java 序列化兼容。


Edit: I even removed the parameter altogether, since the code under registerKryoClasses sets the property in any case.编辑:我什至完全删除了参数,因为registerKryoClasses下的代码在任何情况下都会设置该属性。 In fact, I suspect Kryo serialization is being used (I added a println inside write and it appears, but some kind of previous validation is incorrect).事实上,我怀疑正在使用 Kryo 序列化(我在write添加了一个 println 并且它出现了,但某种先前的验证是不正确的)。

Have you tried to remove ' from your submit, imho it should be您是否尝试从提交中删除 ',恕我直言它应该是

--conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"

do you submit from luigi by any chance?你有机会从 luigi 提交吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM