简体   繁体   中英

Hadoop Serialization and De-Serialization

I have my file to be processed is stored in HDFS in Binary Stream format . Now I have to do some processing over the file using map-reduce. The input file is split into no of blocks(The file is in the original format when it arrives the input block) My question is when does this de-serialization occurs? I have the writable interface implemented in my code and it has two methods ie readFields and write. Is these methods are responsible for de serialization and serialization of actual data stored in HDFS? If yes, Could you please explain the flow of data? I'm stuck with this concept for the whole day, Please help..

Serialization occurs during write method on Context object in the mapper phase. In the code when you write context.write(key,value{own_object}), serialization starts. Once the map output is written to the local disk, SS will come into picture. In this phase the intermediate output will be processed by the framework. Here comes the de-serialization(using read()). You can see the serialized data after mapper.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM