简体繁体 English

Hadoop序列化和反序列化

[英]Hadoop Serialization and De-Serialization

原文 2016-04-25 10:47:45 2 1 java/ hadoop/ mapreduce

I have my file to be processed is stored in HDFS in Binary Stream format . 我有要处理的文件以二进制流格式存储在HDFS中。 Now I have to do some processing over the file using map-reduce. 现在，我必须使用map-reduce对文件进行一些处理。 The input file is split into no of blocks(The file is in the original format when it arrives the input block) My question is when does this de-serialization occurs? 输入文件被拆分为任何块（到达输入块时，文件为原始格式）我的问题是何时反序列化发生？ I have the writable interface implemented in my code and it has two methods ie readFields and write. 我的代码中实现了可写接口，它具有两个方法，即readFields和write。 Is these methods are responsible for de serialization and serialization of actual data stored in HDFS? 这些方法是否负责对HDFS中存储的实际数据进行反序列化和序列化？ If yes, Could you please explain the flow of data? 如果是，请您解释一下数据流吗？ I'm stuck with this concept for the whole day, Please help.. 我整日都坚持这个概念，请帮忙。

1 个解决方案

Serialization occurs during write method on Context object in the mapper phase. 序列化在映射器阶段对Context对象执行write方法期间发生。 In the code when you write context.write(key,value{own_object}), serialization starts. 在编写context.write（key，value {own_object}）的代码中，序列化开始。 Once the map output is written to the local disk, SS will come into picture. 将映射输出写入本地磁盘后，SS就会出现。 In this phase the intermediate output will be processed by the framework. 在此阶段，中间输出将由框架处理。 Here comes the de-serialization(using read()). 这是反序列化（使用read（））。 You can see the serialized data after mapper. 您可以在映射器之后看到序列化的数据。