简体   繁体   English

如何阅读hadoop顺序文件?

[英]How to read hadoop sequential file?

I have a sequential file which is the output of hadoop map-reduce job. 我有一个顺序文件,它是hadoop map-reduce作业的输出。 In this file data is written in key value pairs ,and value itself is a map. 在此文件中,数据以键值对形式写入,值本身是映射。 I want to read the value as a MAP object so that i can process it further. 我想将值作为MAP对象读取,以便我可以进一步处理它。

    Configuration config = new Configuration();
    Path path = new Path("D:\\OSP\\sample_data\\data\\part-00000");
    SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config);
    WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();
    Writable value = (Writable) reader.getValueClass().newInstance();
    long position = reader.getPosition();

    while(reader.next(key,value))
    {
           System.out.println("Key is: "+textKey +" value is: "+val+"\n");
    }

output of program: Key is: [this is key] value is: {abc=839177, xyz=548498, lmn=2, pqr=1} 程序输出:键是:[这是键]值是:{abc = 839177,xyz = 548498,lmn = 2,pqr = 1}

Here i am getting value as string ,but i want it as a object of map. 在这里我获得了作为字符串的价值,但我希望它作为地图的对象。

Check the API documentation for SequenceFile#next(Writable, Writable) 检查SequenceFile#next的API文档(可写,可写)

while(reader.next(key,value))
{
       System.out.println("Key is: "+textKey +" value is: "+val+"\n");
}

should be replaced with 应该换成

while(reader.next(key,value))
{
       System.out.println("Key is: "+key +" value is: "+value+"\n");
}

Use SequenceFile.Reader#getValueClassName to get the value type in the SequenceFile. 使用SequenceFile.Reader#getValueClassName获取SequenceFile中的值类型。 SequenceFile have the key/value types in the file header. SequenceFile在文件头中具有键/值类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM