[英]Reading a simple Avro file from HDFS
I am trying to do a simple read of an Avro file stored in HDFS. 我试图简单读取存储在HDFS中的Avro文件。 I found out how to read it when it is on the local file system....
我发现当它在本地文件系统上时如何阅读....
FileReader reader = DataFileReader.openReader(new File(filename), new GenericDatumReader());
for (GenericRecord datum : fileReader) {
String value = datum.get(1).toString();
System.out.println("value = " value);
}
reader.close();
My file is in HDFS, however. 但是,我的文件是HDFS。 I cannot give the openReader a Path or an FSDataInputStream.
我不能给openReader一个Path或一个FSDataInputStream。 How can I simply read an Avro file in HDFS?
如何在HDFS中读取Avro文件?
EDIT: I got this to work by creating a custom class (SeekableHadoopInput) that implements SeekableInput. 编辑:我通过创建实现SeekableInput的自定义类(SeekableHadoopInput)来实现此目的。 I "stole" this from "Ganglion" on github.
我在github上“偷走”了这个“Ganglion”。 Still, seems like there would be a Hadoop/Avro integration path for this.
似乎仍然会有一个Hadoop / Avro集成路径。
Thanks 谢谢
The FsInput class (in the avro-mapred submodule, since it depends on Hadoop) can do this. FsInput类(在avro-mapred子模块中,因为它依赖于Hadoop)可以做到这一点。 It provides the seekable input stream that is needed for Avro data files.
它提供Avro数据文件所需的可搜索输入流。
Path path = new Path("/path/on/hdfs");
Configuration config = new Configuration(); // make this your Hadoop env config
SeekableInput input = new FsInput(path, config);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
FileReader<GenericRecord> fileReader = DataFileReader.openReader(input, reader);
for (GenericRecord datum : fileReader) {
System.out.println("value = " + datum);
}
fileReader.close(); // also closes underlying FsInput
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.