在 avro 模式中使用“默认”

Question

根据Avro 文档中“默认”属性的定义：“此字段的默认值，用于读取缺少此字段的实例（可选）。”

这意味着如果缺少相应的字段，则采用默认值。

但情况似乎并非如此。 考虑以下student模式：

{
        "type": "record",
        "namespace": "com.example",
        "name": "Student",
        "fields": [{
                "name": "age",
                "type": "int",
                "default": -1
            },
            {
                "name": "name",
                "type": "string",
                "default": "null"
            }
        ]
    }

架构说：如果缺少“年龄”字段，则将值视为-1。 同样对于“名称”字段。

现在，如果我尝试从以下 JSON 构建 Student 模型：

{"age":70}

我得到这个例外：

org.apache.avro.AvroTypeException: Expected string. Got END_OBJECT

    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:698)
    at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:227)

看起来默认值没有按预期工作。 那么， default 在这里的作用究竟是什么？

这是用于生成学生模型的代码：

Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, studentJson);
SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);
return datumReader.read(null, decoder);

（ Student类是由 Avro 编译器从学生模式自动生成的）

Answer 1

我认为对默认值有一些错过的理解，所以希望我的解释也会对其他人有所帮助。 当字段不存在时，默认值对于提供默认值很有用，但这实际上是在您实例化avro对象时（在您的情况下调用datumReader.read ）但它不允许使用不同的模式读取数据，这这就是为什么“模式注册表”的概念对这种情况有用。

以下代码有效并允许读取您的数据

Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, "{\"age\":70}");
SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);

Schema expected = new Schema.Parser().parse("{\n" +
        "  \"type\": \"record\",\n" +
        "  \"namespace\": \"com.example\",\n" +
        "  \"name\": \"Student\",\n" +
        "  \"fields\": [{\n" +
        "    \"name\": \"age\",\n" +
        "    \"type\": \"int\",\n" +
        "    \"default\": -1\n" +
        "  }\n" +
        "  ]\n" +
        "}");

datumReader.setSchema(expected);
System.out.println(datumReader.read(null, decoder));

正如您所看到的，我指定用于“写”不包含字段“name”的json输入的模式，但是（考虑到您的模式包含默认值）当您打印记录时，您将看到名称与您的默认值

{"age": 70, "name": "null"}

以防万一，可能或可能不知道，“null”实际上不是空值，是一个值为“null”的字符串。

Answer 2

只是添加上面答案中已经说过的内容。 如果不存在，则字段为空。 然后将其类型与 null 联合。 否则它只是一个拼写为 null 的字符串，它会进入.example 模式：

{
"name": "name",
"type": [
  "null",
  "string"
],
"default": null

}

然后如果您添加{"age":70}并检索记录，您将获得以下信息：

{"age":70,"name":null}

在 avro 模式中使用“默认”

问题描述

2 个解决方案

解决方案1
2 2018-02-26 16:09:59

解决方案2
0 2021-08-02 07:18:43

在 avro 模式中使用“默认”

问题描述

2 个解决方案

解决方案1 2 2018-02-26 16:09:59

解决方案2 0 2021-08-02 07:18:43

解决方案1
2 2018-02-26 16:09:59

解决方案2
0 2021-08-02 07:18:43