简体   繁体   English

Spark、Kryo 序列化问题与 ProtoBuf 字段

[英]Spark, Kryo Serialization Issue with ProtoBuf field

I am seeing an error when running my spark job relating to Serialization of a protobuf field when transforming an RDD.在转换 RDD 时,我在运行与 protobuf 字段的序列化相关的 spark 作业时看到错误。

com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: otherAuthors_ (com.thomsonreuters.kraken.medusa.dbor.proto.Book$DBBooks) com.esotericsoftware.kryo.KryoException:java.lang.UnsupportedOperationException 序列化跟踪:otherAuthors_ (com.thomsonreuters.kraken.medusa.dbor.proto.Book$DBBooks)

The error seems to be created at this point:错误似乎是在这一点上创建的:

val booksPerTier: Iterable[(TimeTier, RDD[DBBooks])] = allTiers.map {
      tier => (tier, books.filter(b => isInTier(endOfInterval, tier, b) &&     !isBookPublished(o)).mapPartitions( it =>
      it.map{ord =>
        (ord.getAuthor, ord.getPublisherName, getGenre(ord.getSourceCountry))}))
}

val averagesPerAuthor = booksPerTier.flatMap { case (tier, opt) =>
  opt.map(o => (tier, o._1, PublisherCompanyComparison, o._3)).countByValue()
}

val averagesPerPublisher = booksPerTier.flatMap { case (tier, opt) =>
  opt.map(o => (tier, o._1, PublisherComparison(o._2), o._3)).countByValue()
}

The field is a list specified in the protobuf as the below:该字段是 protobuf 中指定的列表,如下所示:

otherAuthors_ = java.util.Collections.emptyList()

As you can see the code is not actually utilising that field from the Book Protobuf, although it still is being transmitted over the network.如您所见,该代码实际上并未利用 Book Protobuf 中的该字段,尽管它仍在通过网络传输。

Has anyone got any advice on this?有没有人对此有任何建议?

OK, old question but here is an answer for the future generations.好的,老问题,但这是给后代的答案。 Default kryo serializers don't work well with some collections.默认 kryo 序列化程序不适用于某些集合。 There is a third party library that helps with it: kryo-serializers有一个第三方库可以帮助它: kryo-serializers

In your case you probably need to provide a custom kryo registrator when creating spark config:在您的情况下,您可能需要在创建 spark 配置时提供自定义 kryo 注册器:

val conf = new SparkConf()
conf.set("spark.kryo.registrator", "MyKryoRegistrator")

With needed custom registrations in your registrator:在您的注册器中使用所需的自定义注册:

class MyKryoRegistrator extends KryoRegistrator {
    override def registerClasses(kryo: Kryo) {
        kryo.register( Collections.EMPTY_LIST.getClass(), new CollectionsEmptyListSerializer() );
        // Probably should use proto serializer for your proto classes
        kryo.register( Book.class, new ProtobufSerializer() );
    } 
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM