[英]Kryo registration of LabeledPoint class
I am trying to run a very simple scala class in spark with Kryo registration. 我正在尝试使用Kryo注册运行一个非常简单的scala类。 This class just loads data from a file into an
RDD[LabeledPoint]
. 此类仅将数据从文件加载到
RDD[LabeledPoint]
。
The code (inspired from the one in https://spark.apache.org/docs/latest/mllib-decision-tree.html ): 代码(来自https://spark.apache.org/docs/latest/mllib-decision-tree.html中的代码):
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
object test {
def main(args: Array[String]) {
val conf = new SparkConf().setMaster("local").setAppName("test")
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.kryo.registrationRequired", "true")
val sc = new SparkContext(conf)
sc.getConf.registerKryoClasses(classOf[ org.apache.spark.mllib.regression.LabeledPoint ])
sc.getConf.registerKryoClasses(classOf[ org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] ])
// Load data
val rawData = sc.textFile("data/mllib/sample_tree_data.csv")
val data = rawData.map { line =>
val parts = line.split(',').map(_.toDouble)
LabeledPoint(parts(0), Vectors.dense(parts.tail))
}
sc.stop()
System.exit(0)
}
}
What I understand i that, as I have set spark.kryo.registrationRequired = true
, all utilized classes must be registered, so that I have registered RDD[LabeledPoint]
and LabeledPoint
. 据我了解,我已经将
spark.kryo.registrationRequired = true
设置了,必须注册所有利用的类,以便我已经注册了RDD[LabeledPoint]
和LabeledPoint
。
The problem 问题
I receive the following error: 我收到以下错误:
java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.mllib.regression.LabeledPoint[]
Note: To register this class use: kryo.register(org.apache.spark.mllib.regression.LabeledPoint[].class);
at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
As I understand it, it means that the class LabeledPoint[]
is not registered, whereas I have registered the class LabeledPoint
. 据我了解,这意味着未注册
LabeledPoint[]
类,而我已经注册了LabeledPoint
类。
Furthermore, the code proposed in the error to register the class ( kryo.register(org.apache.spark.mllib.regression.LabeledPoint[].class);
) does not work. 此外,错误中提出的用于注册类的代码(
kryo.register(org.apache.spark.mllib.regression.LabeledPoint[].class);
)不起作用。
Thanks a lot to @eliasah who contributed a lot to this answer by pointing out that the proposed solution ( kryo.register(org.apache.spark.mllib.regression.LabeledPoint[].class);
) is in Java
and not in Scala. 非常感谢@eliasah,他通过指出所提出的解决方案(
kryo.register(org.apache.spark.mllib.regression.LabeledPoint[].class);
)在Java
而不在Scala中为该答案做出了很大贡献kryo.register(org.apache.spark.mllib.regression.LabeledPoint[].class);
。
Hence, what LabeledPoint[]
means in Scala is Array[LabeledPoint]
. 因此,
LabeledPoint[]
在Scala中的含义是Array[LabeledPoint]
。
I solved my problem by registering the Array[LabeledPoint]
class, ie adding in my code: 我通过注册
Array[LabeledPoint]
类(即添加我的代码)解决了我的问题:
sc.getConf.registerKryoClasses(classOf[ Array[org.apache.spark.mllib.regression.LabeledPoint] ])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.