繁体   English   中英

将Json数据解码为Avro类

[英]Decoding Json data to Avro classes

我有一些文件记录,其中存储为纯文本Json。 样本记录:

{
      "datasetID": "Orders",
      "recordID": "rid1",
      "recordGroupID":"asdf1",
      "recordType":"asdf1",
      "recordTimestamp": 100,
      "recordPartitionTimestamp": 100,
      "recordData":{
        "customerID": "cid1",
        "marketplaceID": "mid1",
        "quantity": 10,
        "buyingDate": "1481353448",
        "orderID" : "oid1"
      }
}

对于每个记录, recordData可以为null 如果recordData存在, orderID可能null

我编写以下Avro模式来表示结构:

[{
  "namespace":"model",
  "name":"OrderRecordData",
  "type":"record",
  "fields":[
    {"name":"marketplaceID","type":"string"},
    {"name":"customerID","type":"string"},
    {"name":"quantity","type":"long"},
    {"name":"buyingDate","type":"string"},
    {"name":"orderID","type":["null", "string"]}
  ]
},
{
  "namespace":"model",
  "name":"Order",
  "type":"record",
  "fields":[
    {"name":"datasetID","type":"string"},
    {"name":"recordID","type":"string"},
    {"name":"recordGroupID","type":"string"},
    {"name":"recordType","type":"string"},
    {"name":"recordTimestamp","type":"long"},
    {"name":"recordPartitionTimestamp","type":"long"},
    {"name":"recordData","type": ["null", "model.OrderRecordData"]}
  ]
}]

最后,我使用以下方法将每个String记录反序列化到我的Avro类中:

Order jsonDecodeToAvro(String inputString) {
        return new SpecificDatumReader<Order>(Order.class)
           .read(null, DecoderFactory.get().jsonDecoder(Order.SCHEMA$, inputString));
}

但是,在尝试达到上述记录时,我总是遇到异常:

org.apache.avro.AvroTypeException: Unknown union branch customerID
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445)

我究竟做错了什么? 我正在使用JDK8和Avro 1.7.7

json输入必须采用以下形式

{
  "datasetID": "Orders",
  "recordID": "rid1",
  "recordGroupID":"asdf1",
  "recordType":"asdf1",
  "recordTimestamp": 100,
  "recordPartitionTimestamp": 100,
  "recordData":{
  "model.OrderRecordData" :{
    "orderID" : null,
    "customerID": "cid1",
    "marketplaceID": "mid1",
    "quantity": 10,
    "buyingDate": "1481353448"
    } 
  }
}

这是因为Avro的JSON编码处理联合和null的方式。

看看这个:

如何修复预期的启动工会。 在命令行上将JSON转换为Avro时得到了VALUE_NUMBER_INT?

关于此还有一个未解决的问题: https : //issues.apache.org/jira/browse/AVRO-1582

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM