简体   繁体   English

如何从 JSON object 生成 Kafka Schema

[英]How to generate Kafka Schema from JSON object

I have a sample JSON data with some nested keys holding different value types, like integers, floats and strings:我有一个示例 JSON 数据,其中一些嵌套键包含不同的值类型,如整数、浮点数和字符串:

{
    "ordertime": 1497014222380,
    "orderid": 18,
    "itemid": "Item_184",
    "address": {
        "city": "Mountain View",
        "state": "CA",
        "zipcode": 94041
    }
}

I need to write a Schema to be registered in Kafka Schema Registry so this sample JSON data can be serialized with JSON_SR, AVRO or Protobuf.我需要编写一个要在 Kafka Schema Registry 中注册的 Schema,因此这个示例 JSON 数据可以使用 JSON_SR、AVRO 或 Protobuf 进行序列化。

Is there any generator library for Python or Node that can take the JSON data object as an input and output Kafka Schema for one of three serializers, such as JSON_SR, AVRO or Protobuf? Is there any generator library for Python or Node that can take the JSON data object as an input and output Kafka Schema for one of three serializers, such as JSON_SR, AVRO or Protobuf?

Below is an example of JSON Schema used to define an object with three fields:下面是 JSON Schema 的示例,用于定义具有三个字段的 object:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "http://example.com/myURI.schema.json",
  "title": "SampleRecord",
  "description": "Sample schema to help you get started.",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "myField1": {
      "type": "integer",
      "description": "The integer type is used for integral numbers."
    },
    "myField2": {
      "type": "number",
      "description": "The number type is used for any numeric type, either integers or floating point numbers."
    },
    "myField3": {
      "type": "string",
      "description": "The string type is used for strings of text."
    }
  }
}

Is there any generator library for Python or Node that can take the JSON data object as an input and output Kafka Schema for one of three serializers, such as JSON_SR, AVRO or Protobuf? Is there any generator library for Python or Node that can take the JSON data object as an input and output Kafka Schema for one of three serializers, such as JSON_SR, AVRO or Protobuf?

There is actually nothing kafka-specific about the schema that is integrated with the schema registry, it's really just a plain JSON schema or Avro schema or Protobuf schema.实际上,与模式注册表集成的模式没有任何特定于 kafka 的内容,它实际上只是一个普通的 JSON 模式或 Avro 模式或 Protobuf 模式。

To narrow it down a bit, assuming you're using the python client and choose to serialize with JSON, then the way to go is:为了缩小范围,假设您使用的是 python 客户端并选择使用 JSON 进行序列化,那么 go 的方法是:

  • create a JSON schema for your data.为您的数据创建一个 JSON 架构。 As said above there's nothing Kafka specific about that step.如上所述,卡夫卡没有关于该步骤的具体内容。 Crafting it manually would be my recommendation (see my closing not below) although if you prefer to generate it, any tool like jsonformatter or jsonschema.net might be what you're looking for手动制作它是我的建议(见我的结尾不是下面)虽然如果你喜欢生成它,像jsonformatterjsonschema.net这样的任何工具都可能是你正在寻找的
  • use the Confluent's Python serializing producer and configure it to use the jsonserializer使用Confluent 的 Python 序列化生产者并将其配置为使用jsonserializer
  • configure the jsonserializer to point to the schema registry and sets its schema_str parameter to the schema you'd have obtained above.jsonserializer配置为指向架构注册表,并将其schema_str参数设置为您在上面获得的架构。

If you choose to use Avro or Protobuf instead, than the actual question is how to convert the json data into an Avro or Protobuf python object, which again is non Kafka specific.如果您选择使用 Avro 或 Protobuf,那么实际问题是如何将 json 数据转换为 Avro 或 Protobuf python object,这又不是 Kafka 特定的。 Once that step is done, the same pattern as above can be used, replacing the jsonserializer with the one for Avro or Protobuf.完成该步骤后,可以使用与上述相同的模式,将jsonserializer替换为 Avro 或 Protobuf 的模式。

Note that it's often a better idea to craft your schema manually instead of using a generator and carefully think about what's optional, what should be a union type... since you want to keep the ability to update it in the future while respecting the schema registry compatibility rules , and an auto-generated schema might be too restrictive or not future proof.请注意,手动制作架构而不是使用生成器通常是一个更好的主意,并仔细考虑什么是可选的,什么应该是联合类型......因为您希望在尊重架构的同时保持将来更新它的能力注册表兼容性规则,以及自动生成的模式可能限制性太强或无法适应未来。 Also,there are some technology limitations you want to take into account, eg Protobuf 3 is not able to distinguish missing values from default values, st you might decide to use wrappers or so.此外,您还需要考虑一些技术限制,例如 Protobuf 3 无法区分缺失值和默认值,因此您可能会决定使用包装器等。

The answer is no.答案是不。 I have a similar issue, the major challenge was to generate the key and value schemas.我有一个类似的问题,主要挑战是生成键和值模式。

The solution was to create an application to create a custom mapping between the json and Kafka schemas解决方案是创建一个应用程序来创建 json 和 Kafka 模式之间的自定义映射

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM