简体   繁体   English

从HDFS读取一个简单的Avro文件

[英]Reading a simple Avro file from HDFS

I am trying to do a simple read of an Avro file stored in HDFS. 我试图简单读取存储在HDFS中的Avro文件。 I found out how to read it when it is on the local file system.... 我发现当它在本地文件系统上时如何阅读....

FileReader reader = DataFileReader.openReader(new File(filename), new GenericDatumReader());

for (GenericRecord datum : fileReader) {
   String value = datum.get(1).toString();
   System.out.println("value = " value);
}

reader.close();

My file is in HDFS, however. 但是,我的文件是HDFS。 I cannot give the openReader a Path or an FSDataInputStream. 我不能给openReader一个Path或一个FSDataInputStream。 How can I simply read an Avro file in HDFS? 如何在HDFS中读取Avro文件?

EDIT: I got this to work by creating a custom class (SeekableHadoopInput) that implements SeekableInput. 编辑:我通过创建实现SeekableInput的自定义类(SeekableHadoopInput)来实现此目的。 I "stole" this from "Ganglion" on github. 我在github上“偷走”了这个“Ganglion”。 Still, seems like there would be a Hadoop/Avro integration path for this. 似乎仍然会有一个Hadoop / Avro集成路径。

Thanks 谢谢

The FsInput class (in the avro-mapred submodule, since it depends on Hadoop) can do this. FsInput类(在avro-mapred子模块中,因为它依赖于Hadoop)可以做到这一点。 It provides the seekable input stream that is needed for Avro data files. 它提供Avro数据文件所需的可搜索输入流。

Path path = new Path("/path/on/hdfs");
Configuration config = new Configuration(); // make this your Hadoop env config
SeekableInput input = new FsInput(path, config);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
FileReader<GenericRecord> fileReader = DataFileReader.openReader(input, reader);

for (GenericRecord datum : fileReader) {
    System.out.println("value = " + datum);
}

fileReader.close(); // also closes underlying FsInput

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM