简体   繁体   English

Scala中的Java对象序列化

[英]Java Object serialization in scala

Pardon me as I am new to Scala. 对不起,因为我是Scala的新手。 I have created a case class which encapsultes some information. 我创建了一个案例类,其中包含一些信息。 One of the objects i want to take in for that is of JavaClass. 我要接受的对象之一是JavaClass。 As i am using in spark, i would need it to be serializable. 当我在Spark中使用时,我需要它可序列化。 How can i do that? 我怎样才能做到这一点?

Java class Java类

public class Currency {

    public Currency(final BigDecimal amount, final CurrencyUnit unit) {
        //Doing Something
    }
}

case class ReconEntity(inputCurrency : Currency, outputCurrency : Currency)

Using implicit i want to have my serialization code for Currency so that spark can work on ReconEntity. 使用隐式,我想拥有我的货币序列化代码,以便spark可以在ReconEntity上工作。

Firstly, have you tried some RDD operations using your Currency and ReconEntity classes? 首先,您是否使用CurrencyReconEntity类尝试了一些RDD操作? Do you actually get an error? 您实际上收到错误消息了吗? Spark is able to handle RDD operations with apparently non-serializable Scala classes as values, at least (you can try this in the spark-shell , though possibly this might require the Kryo serializer to be enabled). Spark至少能够使用显然不可序列化的Scala类作为值来处理RDD操作(您可以在spark-shell尝试此操作,尽管可能需要启用Kryo序列化程序)。

Since you state that you don't own the Currency class, you can't add extends Serializable , which would be the simplest solution. 由于您声明自己不拥有Currency类,因此无法添加extends Serializable ,这将是最简单的解决方案。

Another approach is to wrap the class with a serializable wrapper, as described in this article: Beating Serialization in Spark - example code copied here for convenience: 另一种方法是用可序列化的包装器包装类,如本文所述: 在Spark中击败序列化 -为方便起见在此处复制示例代码:

For simple classes, it is easiest to make a wrapper interface that extends Serializable. 对于简单的类,最简单的方法是创建扩展Serializable的包装器接口。 This means that even though UnserializableObject cannot be serialized we can pass in the following object without any issue 这意味着即使UnserializableObject无法序列化,我们也可以毫无问题地传递以下对象

public interface UnserializableWrapper extends Serializable {
  public UnserializableObject create(String prm1, String prm2);
}

The object can then be passed into an RDD or Map function using the following approach 然后可以使用以下方法将对象传递到RDD或Map函数中

UnserializableWrapper usw = new UnserializableWrapper() {
  public UnserializableObject create(String prm1, String prm2) {
    return new UnserializableObject(prm1,prm2);
} }

If the class is merely a data structure, without significant methods, then it might be easier to unpack its fields into your RDD types (in your case, ReconEntity ) and discard the class itself. 如果该类只是一个没有重要方法的数据结构,则将其字段解压缩为RDD类型(在您的情况下为ReconEntity )并丢弃该类本身可能会更容易。

If the class has methods that you need, then your other (ugly) option is to cut-and-paste code into a new serializable class or into helper functions in your Spark code. 如果该类具有所需的方法,则另一个(很丑陋的)选择是将代码剪切并粘贴到新的可序列化类或Spark代码中的帮助器函数中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM