简体   繁体   English

模式不匹配使用 fastavro 中的别名在 2 个模式之间转换数据

[英]schema mismatch converting data between 2 schemas using aliases in fastavro

I'm trying to convert some data that matches schema old_schema to the field names used in new_schema using aliases.我正在尝试使用别名将一些与模式old_schema匹配的数据转换为new_schema中使用的字段名称。

I've been at it for too long and can't see what is wrong with this code:我已经用了太久了,看不出这段代码有什么问题:

from fastavro import writer, reader, json_writer
from fastavro.schema import parse_schema
from io import BytesIO

# Sample data
input_json = [
    {
        "key1": "value1",
        "key2": "value2",
        "key3": "value3"
    }
]

# Old schema that matches the input_json
old_schema = parse_schema({
    "type": "record",
    "namespace": "com.node40",
    "name": "generated",
    "fields": [
        {
            "name": "key1",
            "type": "string"
        },
        {
            "name": "key2",
            "type": "string"
        },
        {
            "name": "key3",
            "type": "string"
        }
    ]
})

# New schema with old schema names as aliases
new_schema = parse_schema({
    "type": "record",
    "namespace": "com.node40",
    "name": "test",
    "fields": [
        {
            "name": "k1",
            "type": "string",
            "aliases": ["key1"]
        },
        {
            "name": "k2",
            "type": "string",
            "aliases": ["key2"]
        },
        {
            "name": "k3",
            "type": "string",
            "aliases": ["key3"]
        }
    ]
})
records = [
    {
        "key1": "value1",
        "key2": "value2",
        "key3": "value3"
    }
]

# Write to buffer as serialized avro using old_schema
buffer = BytesIO()
writer(buffer, old_schema, input_json, validator=True)
buffer.seek(0)

# Read serialized avro from buffer, deserialize and write to json file
input_avro = reader(buffer, new_schema)
json_writer('fitted_data.json', new_schema, input_avro)

This results in a SchemaResolutionError from fastavro .这会导致来自SchemaResolutionErrorfastavro This is such a simple example but I just can't see what is wrong with this.这是一个如此简单的例子,但我看不出这有什么问题。 Help appreciated!帮助赞赏!

The main problem is that your old schema is named generated with a namespace of com.node40 .主要问题是您的旧模式被命名为generated with a namespace of com.node40 The new schema has the same namespace, but is named test .新模式具有相同的名称空间,但名为test The avro resolution rules state that for these records to match both schemas are records with the same (unqualified) name . avro 解析规则state 对于匹配both schemas are records with the same (unqualified) name

So you can either rename the new schema to match the old one, or again use aliases and on the new schema do the following:因此,您可以重命名新架构以匹配旧架构,或者再次使用别名并在新架构上执行以下操作:

new_schema = {
    "type": "record",
    "namespace": "com.node40",
    "name": "test",
    "aliases": ["com.node40.generated"],
    ...
}

Note: Technically you should only have to write "aliases": ["generated"] but it looks like there is a bug in fastavro where it is not handling that case correctly, but putting the fully namespaced name will work.注意:从技术上讲,您只需要编写"aliases": ["generated"]但看起来 fastavro 中存在一个错误,它没有正确处理这种情况,但是输入完全命名空间的名称会起作用。

After you do all that, your example will still fail because at the very end you have json_writer('fitted_data.json', new_schema, input_avro) but that should be changed to:在你完成所有这些之后,你的示例仍然会失败,因为最后你有json_writer('fitted_data.json', new_schema, input_avro)但应该更改为:

with open('fitted_data.json', 'w') as fo:
    json_writer(fo, new_schema, input_avro)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM