[英]schema mismatch converting data between 2 schemas using aliases in fastavro
I'm trying to convert some data that matches schema old_schema
to the field names used in new_schema
using aliases.我正在尝试使用别名将一些与模式
old_schema
匹配的数据转换为new_schema
中使用的字段名称。
I've been at it for too long and can't see what is wrong with this code:我已经用了太久了,看不出这段代码有什么问题:
from fastavro import writer, reader, json_writer
from fastavro.schema import parse_schema
from io import BytesIO
# Sample data
input_json = [
{
"key1": "value1",
"key2": "value2",
"key3": "value3"
}
]
# Old schema that matches the input_json
old_schema = parse_schema({
"type": "record",
"namespace": "com.node40",
"name": "generated",
"fields": [
{
"name": "key1",
"type": "string"
},
{
"name": "key2",
"type": "string"
},
{
"name": "key3",
"type": "string"
}
]
})
# New schema with old schema names as aliases
new_schema = parse_schema({
"type": "record",
"namespace": "com.node40",
"name": "test",
"fields": [
{
"name": "k1",
"type": "string",
"aliases": ["key1"]
},
{
"name": "k2",
"type": "string",
"aliases": ["key2"]
},
{
"name": "k3",
"type": "string",
"aliases": ["key3"]
}
]
})
records = [
{
"key1": "value1",
"key2": "value2",
"key3": "value3"
}
]
# Write to buffer as serialized avro using old_schema
buffer = BytesIO()
writer(buffer, old_schema, input_json, validator=True)
buffer.seek(0)
# Read serialized avro from buffer, deserialize and write to json file
input_avro = reader(buffer, new_schema)
json_writer('fitted_data.json', new_schema, input_avro)
This results in a SchemaResolutionError
from fastavro
.这会导致来自
SchemaResolutionError
的fastavro
。 This is such a simple example but I just can't see what is wrong with this.这是一个如此简单的例子,但我看不出这有什么问题。 Help appreciated!
帮助赞赏!
The main problem is that your old schema is named generated
with a namespace of com.node40
.主要问题是您的旧模式被命名为
generated
with a namespace of com.node40
。 The new schema has the same namespace, but is named test
.新模式具有相同的名称空间,但名为
test
。 The avro resolution rules state that for these records to match both schemas are records with the same (unqualified) name
. avro 解析规则state 对于匹配
both schemas are records with the same (unqualified) name
。
So you can either rename the new schema to match the old one, or again use aliases and on the new schema do the following:因此,您可以重命名新架构以匹配旧架构,或者再次使用别名并在新架构上执行以下操作:
new_schema = {
"type": "record",
"namespace": "com.node40",
"name": "test",
"aliases": ["com.node40.generated"],
...
}
Note: Technically you should only have to write "aliases": ["generated"]
but it looks like there is a bug in fastavro where it is not handling that case correctly, but putting the fully namespaced name will work.注意:从技术上讲,您只需要编写
"aliases": ["generated"]
但看起来 fastavro 中存在一个错误,它没有正确处理这种情况,但是输入完全命名空间的名称会起作用。
After you do all that, your example will still fail because at the very end you have json_writer('fitted_data.json', new_schema, input_avro)
but that should be changed to:在你完成所有这些之后,你的示例仍然会失败,因为最后你有
json_writer('fitted_data.json', new_schema, input_avro)
但应该更改为:
with open('fitted_data.json', 'w') as fo:
json_writer(fo, new_schema, input_avro)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.