[英]How to read hadoop map file using python?
I have map file that is block compressed using DefaultCodec. 我有使用DefaultCodec进行块压缩的地图文件。 The map file is created by java application like this:
映射文件是由Java应用程序创建的,如下所示:
MapFile.Writer writer =
new MapFile.Writer(conf, path,
MapFile.Writer.keyClass(IntWritable.class),
MapFile.Writer.valueClass(BytesWritable.class),
MapFile.Writer.compression(SequenceFile.CompressionType.BLOCK, new DefaultCodec()));
This file is stored in hdfs and I need to read some key,values from it in another application using python. 该文件存储在hdfs中,我需要在另一个使用python的应用程序中从中读取一些键,值。 I can't find any library that can do that.
我找不到任何可以做到这一点的图书馆。 Do you have any suggestion and example?
您有什么建议和例子吗?
Thanks 谢谢
Create a reader as follow: 创建一个阅读器,如下所示:
path = '/hdfs/path/to/file'
key = LongWritable()
value = LongWritable()
reader = MapFile.Reader(path)
while reader.next(key, value):
print key, value
Check out these hadoop.io.MapFile Python examples 查看这些hadoop.io.MapFile Python示例
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.