Kafka Connect - JDBC Avro 连接如何定义自定义模式注册表

Question

I was following tutorial on kafka connect , and I am wondering if there is a possibility to define a custom schema registry for a topic which data came from a MySql table.我正在关注kafka connect 教程，我想知道是否有可能为数据来自 MySql 表的主题定义自定义模式注册表。

I can't find where define it in my json/connect config and I don't want to create a new version of that schema after creating it.我找不到在我的 json/connect 配置中定义它的位置，并且我不想在创建该模式后创建它的新版本。

My MySql table called stations has this schema我的名为 stations 的 MySql 表具有此架构

Field          | Type        
---------------+-------------
code           | varchar(4)  
date_measuring | timestamp   
attributes     | varchar(256)

where the attributes contains a Json data and not a String (I have to use that type because the Json field of the attributes are variable.其中属性包含 Json 数据而不是字符串（我必须使用该类型，因为属性的 Json 字段是可变的。

My connector is我的连接器是

{
  "value.converter.schema.registry.url": "http://localhost:8081",
  "_comment": "The Kafka topic will be made up of this prefix, plus the table name  ",
  "key.converter.schema.registry.url": "http://localhost:8081",
  "name": "jdbc_source_mysql_stations",
  "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
  "key.converter": "io.confluent.connect.avro.AvroConverter",
  "value.converter": "io.confluent.connect.avro.AvroConverter",
  "transforms": [
    "ValueToKey"
  ],
  "transforms.ValueToKey.type": "org.apache.kafka.connect.transforms.ValueToKey",
  "transforms.ValueToKey.fields": [
    "code",
    "date_measuring"
  ],
  "connection.url": "jdbc:mysql://localhost:3306/db_name?useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=UTC",
  "connection.user": "confluent",
  "connection.password": "**************",
  "table.whitelist": [
    "stations"
  ],
  "mode": "timestamp",
  "timestamp.column.name": [
    "date_measuring"
  ],
  "validate.non.null": "false",
  "topic.prefix": "mysql-"
}

and creates this schema并创建这个模式

{
  "subject": "mysql-stations-value",
  "version": 1,
  "id": 23,
  "schema": "{\"type\":\"record\",\"name\":\"stations\",\"fields\":[{\"name\":\"code\",\"type\":\"string\"},{\"name\":\"date_measuring\",\"type\":{\"type\":\"long\",\"connect.version\":1,\"connect.name\":\"org.apache.kafka.connect.data.Timestamp\",\"logicalType\":\"timestamp-millis\"}},{\"name\":\"attributes\",\"type\":\"string\"}],\"connect.name\":\"stations\"}"
}

Where "attributes" field is of course a String.其中“属性”字段当然是一个字符串。 Unlike I would apply it this other schema.不像我会应用它这个其他模式。

    {
  "fields": [
    {
      "name": "code",
      "type": "string"
    },
    {
      "name": "date_measuring",
      "type": {
        "connect.name": "org.apache.kafka.connect.data.Timestamp",
        "connect.version": 1,
        "logicalType": "timestamp-millis",
        "type": "long"
      }
    },
    {
      "name": "attributes",
      "type": {
        "type": "record",
        "name": "AttributesRecord",
        "fields": [
          {
            "name": "H1",
            "type": "long",
            "default": 0
          },
          {
            "name": "H2",
            "type": "long",
            "default": 0
          },
          {
            "name": "H3",
            "type": "long",
            "default": 0
          },          
          {
            "name": "H",
            "type": "long",
            "default": 0
          },          
          {
            "name": "Q",
            "type": "long",
            "default": 0
          },          
          {
            "name": "P1",
            "type": "long",
            "default": 0
          },          
          {
            "name": "P2",
            "type": "long",
            "default": 0
          },          
          {
            "name": "P3",
            "type": "long",
            "default": 0
          },                    
          {
            "name": "P",
            "type": "long",
            "default": 0
          },          
          {
            "name": "T",
            "type": "long",
            "default": 0
          },          
          {
            "name": "Hr",
            "type": "long",
            "default": 0
          },          
          {
            "name": "pH",
            "type": "long",
            "default": 0
          },          
          {
            "name": "RX",
            "type": "long",
            "default": 0
          },          
          {
            "name": "Ta",
            "type": "long",
            "default": 0
          },  
          {
            "name": "C",
            "type": "long",
            "default": 0
          },                  
          {
            "name": "OD",
            "type": "long",
            "default": 0
          },          
          {
            "name": "TU",
            "type": "long",
            "default": 0
          },          
          {
            "name": "MO",
            "type": "long",
            "default": 0
          },          
          {
            "name": "AM",
            "type": "long",
            "default": 0
          },          
          {
            "name": "N03",
            "type": "long",
            "default": 0
          },          
          {
            "name": "P04",
            "type": "long",
            "default": 0
          },          
          {
            "name": "SS",
            "type": "long",
            "default": 0
          },          
          {
            "name": "PT",
            "type": "long",
            "default": 0
          }          
        ]
       }
     }    
  ],
  "name": "stations",
  "namespace": "com.mycorp.mynamespace",
  "type": "record"
}

Any suggestion please?有什么建议吗？ In case it's not possible, I suppose I'll have to create a KafkaStream to create another topic, even if I would avoid it.如果不可能，我想我必须创建一个 KafkaStream 来创建另一个主题，即使我会避免它。

Thanks in advance!提前致谢！

Answer 1

I don't think you're asking anything about using a "custom" registry (which you'd do with the two lines that say which registry you're using), but rather how you can parse the data / apply a schema after the record is pulled from the database我不认为你在问任何关于使用“自定义”注册表的问题（你会用两行来说明你正在使用哪个注册表），而是你如何解析数据/之后应用模式记录从数据库中提取

You can write your own Transform, or you can use Kstreams, which are really the main options here.您可以编写自己的 Transform，也可以使用 Kstreams，它们实际上是这里的主要选项。 There is a SetSchemaMetadata transform, but I'm not sure that'll do what you want (parse a string into an Avro record)有一个 SetSchemaMetadata 转换，但我不确定它会做你想做的（将字符串解析为 Avro 记录）

Or if you must shove JSON data into a single database attribute, maybe you shouldn't use Mysql and rather a document database which has more flexible data constraints.或者，如果您必须将 JSON 数据推送到单个数据库属性中，也许您不应该使用 Mysql，而应该使用具有更灵活数据约束的文档数据库。

Otherwise, you can use BLOB rather than varchar and put binary Avro data into that column, but then you'd still need a custom deserializer to read the data否则，您可以使用 BLOB 而不是 varchar 并将二进制 Avro 数据放入该列，但是您仍然需要自定义反序列化程序来读取数据

Kafka Connect - JDBC Avro 连接如何定义自定义模式注册表

问题描述

1 个解决方案

解决方案1
1 2020-08-20 13:40:45

Kafka Connect - JDBC Avro 连接如何定义自定义模式注册表

问题描述

1 个解决方案

解决方案1 1 2020-08-20 13:40:45

解决方案1
1 2020-08-20 13:40:45