简体   繁体   English

如何从 Java 中的 avro 文件中提取模式

[英]How to extract schema from an avro file in Java

How do you extract first the schema and then the data from an avro file in Java?您如何首先从 Java 的 avro 文件中提取模式然后提取数据? Identical to this question except in java.问题相同,但在 java 中除外。

I've seen examples of how to get the schema from an avsc file but not an avro file.我已经看到了如何从 avsc 文件而不是 avro 文件中获取模式的示例。 What direction should I be looking in?我应该朝哪个方向看?

Schema schema = new Schema.Parser().parse(
    new File("/home/Hadoop/Avro/schema/emp.avsc")
);

If you want know the schema of a Avro file without having to generate the corresponding classes or care about which class the file belongs to, you can use the GenericDatumReader :如果您想知道 Avro 文件的架构,而不必生成相应的类或关心文件属于哪个类,您可以使用GenericDatumReader

DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("file.avro"), datumReader);
Schema schema = dataFileReader.getSchema();
System.out.println(schema);

And then you can read the data inside the file:然后你可以读取文件中的数据:

GenericRecord record = null;
while (dataFileReader.hasNext()) {
    record = dataFileReader.next(record);
    System.out.println(record);
}

Thanks for @Helder Pereira's answer.感谢@Helder Pereira 的回答。 As a complement, the schema can also be fetched from getSchema() of GenericRecord instance.作为补充,模式也可以从GenericRecord实例的getSchema()中获取。
Here is an live demo about it, the link above shows how to get data and schema in java for Parquet , ORC and AVRO data format.是一个关于它的现场演示,上面的链接显示了如何在 Java 中获取ParquetORCAVRO数据格式的数据和模式。

You can use the data bricks library as shown here https://github.com/databricks/spark-avro which will load the avro file into a Dataframe ( Dataset<Row> )您可以使用此处显示的数据砖库https://github.com/databricks/spark-avro它将 avro 文件加载到Dataset<Row> DataframeDataset<Row>

Once you have a Dataset<Row> , you can directly get the schema using df.schema()一旦你有了Dataset<Row> ,你就可以直接使用df.schema()获取模式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM