简体   繁体   English

使用 python (fastavro) 解析多个相互引用的 AVRO (avsc 文件)

[英]Parsing Multiple AVRO (avsc files) which refer each other using python (fastavro)

I have a AVRO schema which is currently in single avsc file like below.我有一个 AVRO 模式,它目前位于单个 avsc 文件中,如下所示。 Now I want to move address record to a different common avsc file which should be referenced from many other avsc file.现在我想将地址记录移动到另一个常见的 avsc 文件中,该文件应该从许多其他 avsc 文件中引用。 So Customer and address will be separate avsc files.所以客户和地址将是单独的 avsc 文件。 How can I separate them and and have customer avsc file reference address avsc file.如何将它们分开并让客户 avsc 文件参考地址 avsc 文件。 Also how would both the files can be processed using python.此外,如何使用 python 处理这两个文件。 I am currently using fast avro in python3 to process the single avsc file but open to use any other utility in python3 or pyspark.我目前在 python3 中使用快速 avro 来处理单个 avsc 文件,但打开以使用 python3 或 pyspark 中的任何其他实用程序。

File name - customer_details.avsc文件名 - customer_details.avsc

[
{
    "type": "record",
    "namespace": "com.company.model",
    "name": "AddressRecord",
    "fields": [
        {
            "name": "streetaddress",
            "type": "string"
        },
        {
            "name": "city",
            "type": "string"
        },
        {
            "name": "state",
            "type": "string"
        },
        {
            "name": "zip",
            "type": "string"
        }
    ]
},
{
    "namespace": "com.company.model",
    "type": "record",
    "name": "Customer",
    "fields": [
        {
            "name": "firstname",
            "type": "string"
        },
        {
            "name": "lastname",
            "type": "string"
        },
        {
            "name": "email",
            "type": "string"
        },
        {
            "name": "phone",
            "type": "string"
        },
        {
            "name": "address",
            "type": {
                "type": "array",
                "items": "com.company.model.AddressRecord"
            }
        }
    ]
}
]
import fastavro

s1 = fastavro.schema.load_schema('customer_details.avsc')

How can split the schema in different file where address record file can be referenced from other avsc file.如何将架构拆分到不同的文件中,地址记录文件可以从其他 avsc 文件中引用。 Then how would I process multiple avsc files using fast Avro (Python) or any other python utility?那么我将如何使用快速 Avro (Python) 或任何其他 python 实用程序处理多个 avsc 文件?

To do this, the schema for the AddressRecord should be in a file called com.company.model.AddressRecord.avsc with the following contents:为此, AddressRecord的架构应位于名为com.company.model.AddressRecord.avsc的文件中,其内容如下:

{
    "type": "record",
    "namespace": "com.company.model",
    "name": "AddressRecord",
    "fields": [
        {
            "name": "streetaddress",
            "type": "string"
        },
        {
            "name": "city",
            "type": "string"
        },
        {
            "name": "state",
            "type": "string"
        },
        {
            "name": "zip",
            "type": "string"
        }
    ]
}

The Customer schema doesn't necessarily need a special naming convention since it is the top level schema, but it's probably a good idea to follow the same convention. Customer模式不一定需要特殊的命名约定,因为它是顶级模式,但遵循相同的约定可能是个好主意。 So it would be in a file named com.company.model.Customer.avsc with the following contents:所以它会在一个名为com.company.model.Customer.avsc的文件中,内容如下:

{
    "namespace": "com.company.model",
    "type": "record",
    "name": "Customer",
    "fields": [
        {
            "name": "firstname",
            "type": "string"
        },
        {
            "name": "lastname",
            "type": "string"
        },
        {
            "name": "email",
            "type": "string"
        },
        {
            "name": "phone",
            "type": "string"
        },
        {
            "name": "address",
            "type": {
                "type": "array",
                "items": "com.company.model.AddressRecord"
            }
        }
    ]
}

The files must be in the same directory.这些文件必须位于同一目录中。

Then you should be able to do fastavro.schema.load_schema('com.company.model.Customer.avsc')然后你应该可以做fastavro.schema.load_schema('com.company.model.Customer.avsc')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM