简体   繁体   English

Hadoop序列化嵌套对象

[英]Hadoop Serialization Nested Objects

I have a class: 我有一堂课:

 class Class1 implements Writable{
       int intField;
       double doubleField;
       Class2 refToClass2;

       public void readField(DataInput in){...}
       public void write(DataOutput out){...}


 class Class2 implements Serializable, Writable{
     ....
 }

Hadoop throws this error on the reducer side, when using Class1 as a output value: 当使用Class1作为输出值时,Hadoop在减速器端抛出此错误:

 java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:961)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:892)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:393)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:354)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:476)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:61)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:569)

My intuition tells me that the problem is related to Class1 or more likely to Class2 which implements both Serializable and Writable. 我的直觉告诉我,该问题与Class1有关,或更可能与实现Serializable和Writable的Class2有关。

Any ideas ? 有任何想法吗 ?

UPDATE: 更新:

I localized the problem: the problem is Class1 which, now, I've changed to implement only the Writable (not also the Serializable). 我已经解决了这个问题:问题是Class1,现在,我已更改为仅实现可写(而不是可序列化)。 I've also changed it in the sense that it doesn't contain a reference to Class2 anymore. 从某种意义上来说,我也对其进行了更改,因为它不再包含对Class2的引用。 I still get the same error. 我仍然遇到相同的错误。 If I replace Class1 with another Writable implementation as an output value, it works! 如果我将Class1替换为另一个Writable实现作为输出值,则可以使用! Why ?? 为什么??

The problem was that I was making a stupid mistake: I was not updating a jar. 问题是我犯了一个愚蠢的错误:我没有更新罐子。 So, basically Class1 was not implementing the Writable interface in the old (in use) jar. 因此,基本上Class1并没有在旧的(使用中)jar中实现Writable接口。

As a general observation: the error specified in the OP has as underlying cause the fact that HADOOP can't find a Serializer for a specific type which you're trying to serialize (being directly or indirectly, eg by using that type as an output key/value). 通常观察到:OP中指定的错误具有根本原因,原因是HADOOP找不到针对您要序列化的特定类型的序列化器(直接或间接(例如,通过使用该类型作为输出)核心价值)。 Hadoop cannot find a Serilizer for one of the 2 reasons: 出于以下两个原因之一,Hadoop无法找到Serilizer:

  1. your type is not serializable (ie it doesn't implement Writable or Serializable) 您的类型不可序列化(即未实现可写或可序列化)
  2. There is no Serializer available to Hadoop for the type of serialization your type implements (eg: your type implements Writable but hadoop for one reason or another cannot use the org.apache.hadoop.io.serializer.WritableSerialization class) 对于您的类型实现的序列化类型,Hadoop没有可用的序列化器(例如:您的类型实现可写但由于某种原因hadoop或其他原因无法使用org.apache.hadoop.io.serializer.WritableSerialization类)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM