Kafka jdbc sink 连接器创建的数据类型与原始数据类型不匹配

Question

I am using Kafka and Kafka Connect to replicate MS SQL Server database to MySQL using debezium sql server CDC source connector and confluent JDBC sink connector.我正在使用 Kafka 和 Kafka Connect 使用 debezium sql server CDC source connector 和 confluent JDBC sink connector 将 MS SQL Server 数据库复制到 MySQL。 The "auto.create" is set to true and the sink connector did create the tables, but some of the data types do not match. “auto.create”设置为 true，接收器连接器确实创建了表，但某些数据类型不匹配。 In SQL Sever, I have在 SQL Sever 中，我有

CREATE TABLE employees (
  id INTEGER IDENTITY(1001,1) NOT NULL PRIMARY KEY,
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE,
  start_date DATE,
  salary INT,
  secret FLOAT,
  create_time TIME
);

but in MySQL, it created the following:但在 MySQL 中，它创建了以下内容：

mysql> desc employees;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| id          | int         | NO   | PRI | NULL    |       |
| first_name  | text        | NO   |     | NULL    |       |
| last_name   | text        | NO   |     | NULL    |       |
| email       | text        | NO   |     | NULL    |       |
| start_date  | int         | YES  |     | NULL    |       |
| salary      | int         | YES  |     | NULL    |       |
| secret      | double      | YES  |     | NULL    |       |
| create_time | bigint      | YES  |     | NULL    |       |
| messageTS   | datetime(3) | YES  |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+

ignore messgeTS, that's an extra field I added in the SMT.忽略messgeTS，这是我在SMT 中添加的一个额外字段。

The data types for first_name, last_name, email, start_date and create time all do not match. first_name、last_name、email、start_date 和 create time 的数据类型都不匹配。 It converts VARCHAR(255) to text, DATE to int, and TIME to bigint.它将 VARCHAR(255) 转换为文本，将 DATE 转换为 int，将 TIME 转换为 bigint。

Just wondering if anything is misconfigured?只是想知道是否有任何配置错误？

I'm running SQL Server 2019 and MySQL 9.0.28 using docker.我正在使用 docker 运行 SQL Server 2019 和 MySQL 9.0.28。

I've also tried the suggestion of disabling autocreate and autoevolve and pre-create the tables with the proper data types.我还尝试了禁用自动创建和自动进化的建议，并使用正确的数据类型预先创建表。

mysql> desc employees;
+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int          | NO   | PRI | NULL    | auto_increment |
| first_name  | varchar(255) | NO   |     | NULL    |                |
| last_name   | varchar(255) | NO   |     | NULL    |                |
| email       | varchar(255) | NO   |     | NULL    |                |
| start_date  | date         | NO   |     | NULL    |                |
| salary      | int          | NO   |     | NULL    |                |
| secret      | double       | NO   |     | NULL    |                |
| create_time | datetime     | NO   |     | NULL    |                |
| messageTS   | datetime     | NO   |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

But it gives the following exceptions when trying to insert into the database:但是在尝试插入数据库时会出现以下异常：

kafka-connect  | [2022-03-04 19:55:07,331] INFO Setting metadata for table "employees" to Table{name='"employees"', type=TABLE columns=[Column{'first_name', isPrimaryKey=false, allowsNull=false, sqlType=VARCHAR}, Column{'secret', isPrimaryKey=false, allowsNull=false, sqlType=DOUBLE}, Column{'salary', isPrimaryKey=false, allowsNull=false, sqlType=INT}, Column{'start_date', isPrimaryKey=false, allowsNull=false, sqlType=DATE}, Column{'email', isPrimaryKey=false, allowsNull=false, sqlType=VARCHAR}, Column{'id', isPrimaryKey=true, allowsNull=false, sqlType=INT}, Column{'last_name', isPrimaryKey=false, allowsNull=false, sqlType=VARCHAR}, Column{'messageTS', isPrimaryKey=false, allowsNull=false, sqlType=DATETIME}, Column{'create_time', isPrimaryKey=false, allowsNull=false, sqlType=DATETIME}]} (io.confluent.connect.jdbc.util.TableDefinitions)
kafka-connect  | [2022-03-04 19:55:07,382] WARN Write of 4 records failed, remainingRetries=0 (io.confluent.connect.jdbc.sink.JdbcSinkTask)
kafka-connect  | java.sql.BatchUpdateException: Data truncation: Incorrect date value: '19055' for column 'start_date' at row 1

The value of the message is消息的价值是

{"id":1002,"first_name":"George","last_name":"Bailey","email":"george.bailey@acme.com","start_date":{"int":19055},"salary":{"int":100000},"secret":{"double":0.867153569942739},"create_time":{"long":1646421476477}}

The schema of the message for the start_date field is start_date 字段的消息架构是

    {
      "name": "start_date",
      "type": [
        "null",
        {
          "type": "int",
          "connect.version": 1,
          "connect.name": "io.debezium.time.Date"
        }
      ],
      "default": null
    }

It looks like it does not know how to convert an io.debezium.time.Date to a Date and treated it as an int instead.看起来它不知道如何将 io.debezium.time.Date 转换为 Date 并将其视为 int 。

Any pointers on this are greatly appreciated.非常感谢对此的任何指示。

Source Config:源配置：

{
    "name": "SimpleSQLServerCDC",
    "config":{
      "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
      "tasks.max":1,
      "key.converter": "io.confluent.connect.avro.AvroConverter",
      "key.converter.schema.registry.url": "http://schema-registry:8081",
      "value.converter": "io.confluent.connect.avro.AvroConverter",
      "value.converter.schema.registry.url": "http://schema-registry:8081",
      "confluent.topic.bootstrap.servers":"kafka:29092",
      "database.hostname" : "sqlserver",
      "database.port" : "1433",
      "database.user" : "sa",
      "database.password" : "",
      "database.dbname" : "testDB",
      "database.server.name" : "corporation",

      "database.history.kafka.topic": "dbhistory.corporation",
      "database.history.kafka.bootstrap.servers" : "kafka:29092",

      "topic.creation.default.replication.factor": 1,
      "topic.creation.default.partitions": 10,
      "topic.creation.default.cleanup.policy": "delete"
    }
  }

Sink Config:接收器配置：

{
  "name": "SimpleMySQLJDBC",
  "config": {
          "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
          "connection.url": "jdbc:mysql://mysql:3306/sinkdb",
          "connection.user": "user",
          "connection.password": "",
          "tasks.max": "2",
          "topics.regex": "corporation.dbo.*",
          "auto.create": "true",
          "auto.evolve": "true",
          "dialect.name": "MySqlDatabaseDialect",
          "insert.mode": "upsert",
          "pk.mode": "record_key",
          "pk.fields":"id",
          "delete.enabled": "true",
          "batch.size": 1,
          "key.converter":"io.confluent.connect.avro.AvroConverter",
          "key.converter.schema.registry.url": "http://schema-registry:8081",
          "value.converter": "io.confluent.connect.avro.AvroConverter",
          "value.converter.schema.registry.url": "http://schema-registry:8081",

          "transforms":"unwrap,dropPrefix,insertTS",

          "transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
          "transforms.dropPrefix.regex":"corporation.dbo.(.*)",
          "transforms.dropPrefix.replacement":"$1",

          "transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState",
          "transforms.unwrap.drop.tombstones":"false",
          "transforms.unwrap.delete.handling.mode":"drop",

          "transforms.insertTS.type": "org.apache.kafka.connect.transforms.InsertField$Value",
          "transforms.insertTS.timestamp.field": "messageTS",

          "errors.log.enable": "true",
          "errors.log.include.messages": "true",
          "errors.tolerance":"all",
          "errors.deadletterqueue.topic.name":"dlq-mysql",
          "errors.deadletterqueue.context.headers.enable": "true",
          "errors.deadletterqueue.topic.replication.factor":"1"
      }
}

Answer 1

converts VARCHAR(255) to text将 VARCHAR(255) 转换为文本

The character limit of the fields is not carried through the Connect API datatypes.字段的字符限制不通过 Connect API 数据类型进行。 Any String-like data will become TEXT column types.任何类似字符串的数据都将成为TEXT列类型。

DATE to int, and TIME to bigint DATE 到 int，TIME 到 bigint

I think, by default, datetime values are converted into Unix epoch.我认为，默认情况下，日期时间值会转换为 Unix 纪元。 You can use the TimestampConverter transform to convert to a different format您可以使用TimestampConverter转换转换为不同的格式

Overall, if you want to accurately preserve types, disable the auto-creation of tables from the sink connector and pre-create tables with the types you want.总体而言，如果您想准确保留类型，请禁用从接收器连接器自动创建表并使用您想要的类型预先创建表。

Answer 2

You need to make 2 changes您需要进行 2 处更改
In Source Connector add "time.precision.mode":"connect"在源连接器中添加"time.precision.mode":"connect"
In sink connector add在接收器连接器中添加

"transforms": "TimestampConverter",
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.target.type": "Timestamp",
"transforms.TimestampConverter.field": "dob",

Kafka jdbc sink 连接器创建的数据类型与原始数据类型不匹配

问题描述

2 个解决方案

解决方案1
1 2022-03-04 21:54:01

解决方案2
0 2022-06-28 06:59:42

Kafka jdbc sink 连接器创建的数据类型与原始数据类型不匹配

问题描述

2 个解决方案

解决方案1 1 2022-03-04 21:54:01

解决方案2 0 2022-06-28 06:59:42

解决方案1
1 2022-03-04 21:54:01

解决方案2
0 2022-06-28 06:59:42