简体   繁体   中英

How to generate Kafka Schema from JSON object

I have a sample JSON data with some nested keys holding different value types, like integers, floats and strings:

{
    "ordertime": 1497014222380,
    "orderid": 18,
    "itemid": "Item_184",
    "address": {
        "city": "Mountain View",
        "state": "CA",
        "zipcode": 94041
    }
}

I need to write a Schema to be registered in Kafka Schema Registry so this sample JSON data can be serialized with JSON_SR, AVRO or Protobuf.

Is there any generator library for Python or Node that can take the JSON data object as an input and output Kafka Schema for one of three serializers, such as JSON_SR, AVRO or Protobuf?

Below is an example of JSON Schema used to define an object with three fields:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "http://example.com/myURI.schema.json",
  "title": "SampleRecord",
  "description": "Sample schema to help you get started.",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "myField1": {
      "type": "integer",
      "description": "The integer type is used for integral numbers."
    },
    "myField2": {
      "type": "number",
      "description": "The number type is used for any numeric type, either integers or floating point numbers."
    },
    "myField3": {
      "type": "string",
      "description": "The string type is used for strings of text."
    }
  }
}

Is there any generator library for Python or Node that can take the JSON data object as an input and output Kafka Schema for one of three serializers, such as JSON_SR, AVRO or Protobuf?

There is actually nothing kafka-specific about the schema that is integrated with the schema registry, it's really just a plain JSON schema or Avro schema or Protobuf schema.

To narrow it down a bit, assuming you're using the python client and choose to serialize with JSON, then the way to go is:

  • create a JSON schema for your data. As said above there's nothing Kafka specific about that step. Crafting it manually would be my recommendation (see my closing not below) although if you prefer to generate it, any tool like jsonformatter or jsonschema.net might be what you're looking for
  • use the Confluent's Python serializing producer and configure it to use the jsonserializer
  • configure the jsonserializer to point to the schema registry and sets its schema_str parameter to the schema you'd have obtained above.

If you choose to use Avro or Protobuf instead, than the actual question is how to convert the json data into an Avro or Protobuf python object, which again is non Kafka specific. Once that step is done, the same pattern as above can be used, replacing the jsonserializer with the one for Avro or Protobuf.

Note that it's often a better idea to craft your schema manually instead of using a generator and carefully think about what's optional, what should be a union type... since you want to keep the ability to update it in the future while respecting the schema registry compatibility rules , and an auto-generated schema might be too restrictive or not future proof. Also,there are some technology limitations you want to take into account, eg Protobuf 3 is not able to distinguish missing values from default values, st you might decide to use wrappers or so.

The answer is no. I have a similar issue, the major challenge was to generate the key and value schemas.

The solution was to create an application to create a custom mapping between the json and Kafka schemas

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM