简体   繁体   中英

If the avro schema is stored with the data, why does the java avro api need me to supply a schema file?

Microsoft Azure decides, in some cases, to dump data in avro format. The data in question is simply json records, from my perspective. So, I just want my json data back from the avro file.

I am looking at how to 'deserialize' avro data, and the examples here:

https://avro.apache.org/docs/1.8.1/gettingstartedjava.html

make the claim:

Data in Avro is always stored with its corresponding schema, meaning we can always read a serialized item regardless of whether we know the schema ahead of time.

Unfortunately, the examples do require knowing the schema ahead of time:

DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, datumReader);

I must be missing something, I just want my data (text / json) format, out of avro. Is there any way of doing that without knowing a schema? Can't avro just read that out of the file itself?

Why write code when there's already a tool to get json?

java -jar avro-tools-1.8.2.jar tojson data.avro > output.json

http://central.maven.org/maven2/org/apache/avro/avro-tools/1.8.2/avro-tools-1.8.2.jar

Otherwise, your file has a schema, and you'd have to extract it first before reading the file contents, which is exactly what the source code of above tool does

https://github.com/apache/avro/blob/master/lang/java/tools/src/main/java/org/apache/avro/tool/DataFileReadTool.java#L77

您需要提供读者的架构,以便 Avro 可以执行架构解析

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM