简体   繁体   English

如果 avro 模式与数据一起存储,为什么 java avro api 需要我提供模式文件?

[英]If the avro schema is stored with the data, why does the java avro api need me to supply a schema file?

Microsoft Azure decides, in some cases, to dump data in avro format.在某些情况下,Microsoft Azure 决定以 avro 格式转储数据。 The data in question is simply json records, from my perspective.从我的角度来看,有问题的数据只是 json 记录。 So, I just want my json data back from the avro file.所以,我只想从 avro 文件中返回我的 json 数据。

I am looking at how to 'deserialize' avro data, and the examples here:我正在研究如何“反序列化”avro 数据,以及此处的示例:

https://avro.apache.org/docs/1.8.1/gettingstartedjava.html https://avro.apache.org/docs/1.8.1/gettingstartedjava.html

make the claim:提出索赔:

Data in Avro is always stored with its corresponding schema, meaning we can always read a serialized item regardless of whether we know the schema ahead of time.

Unfortunately, the examples do require knowing the schema ahead of time:不幸的是,这些示例确实需要提前了解架构:

DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, datumReader);

I must be missing something, I just want my data (text / json) format, out of avro.我一定遗漏了一些东西,我只想要我的数据(文本/json)格式,脱离 avro。 Is there any way of doing that without knowing a schema?有没有办法在不知道架构的情况下做到这一点? Can't avro just read that out of the file itself? avro 不能从文件本身中读取它吗?

Why write code when there's already a tool to get json?既然已经有了获取 json 的工具,为什么还要写代码呢?

java -jar avro-tools-1.8.2.jar tojson data.avro > output.json

http://central.maven.org/maven2/org/apache/avro/avro-tools/1.8.2/avro-tools-1.8.2.jar http://central.maven.org/maven2/org/apache/avro/avro-tools/1.8.2/avro-tools-1.8.2.jar

Otherwise, your file has a schema, and you'd have to extract it first before reading the file contents, which is exactly what the source code of above tool does否则,您的文件有一个架构,您必须在读取文件内容之前先提取它,这正是上述工具的源代码所做的

https://github.com/apache/avro/blob/master/lang/java/tools/src/main/java/org/apache/avro/tool/DataFileReadTool.java#L77 https://github.com/apache/avro/blob/master/lang/java/tools/src/main/java/org/apache/avro/tool/DataFileReadTool.java#L77

您需要提供读者的架构,以便 Avro 可以执行架构解析

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM