简体   繁体   English

Kafka jdbc sink 连接器创建的数据类型与原始数据类型不匹配

[英]Kafka jdbc sink connector creates data types that do not matching the original

I am using Kafka and Kafka Connect to replicate MS SQL Server database to MySQL using debezium sql server CDC source connector and confluent JDBC sink connector.我正在使用 Kafka 和 Kafka Connect 使用 debezium sql server CDC source connector 和 confluent JDBC sink connector 将 MS SQL Server 数据库复制到 MySQL。 The "auto.create" is set to true and the sink connector did create the tables, but some of the data types do not match. “auto.create”设置为 true,接收器连接器确实创建了表,但某些数据类型不匹配。 In SQL Sever, I have在 SQL Sever 中,我有

CREATE TABLE employees (
  id INTEGER IDENTITY(1001,1) NOT NULL PRIMARY KEY,
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE,
  start_date DATE,
  salary INT,
  secret FLOAT,
  create_time TIME
);

but in MySQL, it created the following:但在 MySQL 中,它创建了以下内容:

mysql> desc employees;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| id          | int         | NO   | PRI | NULL    |       |
| first_name  | text        | NO   |     | NULL    |       |
| last_name   | text        | NO   |     | NULL    |       |
| email       | text        | NO   |     | NULL    |       |
| start_date  | int         | YES  |     | NULL    |       |
| salary      | int         | YES  |     | NULL    |       |
| secret      | double      | YES  |     | NULL    |       |
| create_time | bigint      | YES  |     | NULL    |       |
| messageTS   | datetime(3) | YES  |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+

ignore messgeTS, that's an extra field I added in the SMT.忽略messgeTS,这是我在SMT 中添加的一个额外字段。

The data types for first_name, last_name, email, start_date and create time all do not match. first_name、last_name、email、start_date 和 create time 的数据类型都不匹配。 It converts VARCHAR(255) to text, DATE to int, and TIME to bigint.它将 VARCHAR(255) 转换为文本,将 DATE 转换为 int,将 TIME 转换为 bigint。

Just wondering if anything is misconfigured?只是想知道是否有任何配置错误?

I'm running SQL Server 2019 and MySQL 9.0.28 using docker.我正在使用 docker 运行 SQL Server 2019 和 MySQL 9.0.28。

I've also tried the suggestion of disabling autocreate and autoevolve and pre-create the tables with the proper data types.我还尝试了禁用自动创建和自动进化的建议,并使用正确的数据类型预先创建表。

mysql> desc employees;
+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int          | NO   | PRI | NULL    | auto_increment |
| first_name  | varchar(255) | NO   |     | NULL    |                |
| last_name   | varchar(255) | NO   |     | NULL    |                |
| email       | varchar(255) | NO   |     | NULL    |                |
| start_date  | date         | NO   |     | NULL    |                |
| salary      | int          | NO   |     | NULL    |                |
| secret      | double       | NO   |     | NULL    |                |
| create_time | datetime     | NO   |     | NULL    |                |
| messageTS   | datetime     | NO   |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

But it gives the following exceptions when trying to insert into the database:但是在尝试插入数据库时​​会出现以下异常:

kafka-connect  | [2022-03-04 19:55:07,331] INFO Setting metadata for table "employees" to Table{name='"employees"', type=TABLE columns=[Column{'first_name', isPrimaryKey=false, allowsNull=false, sqlType=VARCHAR}, Column{'secret', isPrimaryKey=false, allowsNull=false, sqlType=DOUBLE}, Column{'salary', isPrimaryKey=false, allowsNull=false, sqlType=INT}, Column{'start_date', isPrimaryKey=false, allowsNull=false, sqlType=DATE}, Column{'email', isPrimaryKey=false, allowsNull=false, sqlType=VARCHAR}, Column{'id', isPrimaryKey=true, allowsNull=false, sqlType=INT}, Column{'last_name', isPrimaryKey=false, allowsNull=false, sqlType=VARCHAR}, Column{'messageTS', isPrimaryKey=false, allowsNull=false, sqlType=DATETIME}, Column{'create_time', isPrimaryKey=false, allowsNull=false, sqlType=DATETIME}]} (io.confluent.connect.jdbc.util.TableDefinitions)
kafka-connect  | [2022-03-04 19:55:07,382] WARN Write of 4 records failed, remainingRetries=0 (io.confluent.connect.jdbc.sink.JdbcSinkTask)
kafka-connect  | java.sql.BatchUpdateException: Data truncation: Incorrect date value: '19055' for column 'start_date' at row 1

The value of the message is消息的价值是

{"id":1002,"first_name":"George","last_name":"Bailey","email":"george.bailey@acme.com","start_date":{"int":19055},"salary":{"int":100000},"secret":{"double":0.867153569942739},"create_time":{"long":1646421476477}}

The schema of the message for the start_date field is start_date 字段的消息架构是

    {
      "name": "start_date",
      "type": [
        "null",
        {
          "type": "int",
          "connect.version": 1,
          "connect.name": "io.debezium.time.Date"
        }
      ],
      "default": null
    }

It looks like it does not know how to convert an io.debezium.time.Date to a Date and treated it as an int instead.看起来它不知道如何将 io.debezium.time.Date 转换为 Date 并将其视为 int 。

Any pointers on this are greatly appreciated.非常感谢对此的任何指示。

Source Config:源配置:

{
    "name": "SimpleSQLServerCDC",
    "config":{
      "connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
      "tasks.max":1,
      "key.converter": "io.confluent.connect.avro.AvroConverter",
      "key.converter.schema.registry.url": "http://schema-registry:8081",
      "value.converter": "io.confluent.connect.avro.AvroConverter",
      "value.converter.schema.registry.url": "http://schema-registry:8081",
      "confluent.topic.bootstrap.servers":"kafka:29092",
      "database.hostname" : "sqlserver",
      "database.port" : "1433",
      "database.user" : "sa",
      "database.password" : "",
      "database.dbname" : "testDB",
      "database.server.name" : "corporation",

      "database.history.kafka.topic": "dbhistory.corporation",
      "database.history.kafka.bootstrap.servers" : "kafka:29092",

      "topic.creation.default.replication.factor": 1,
      "topic.creation.default.partitions": 10,
      "topic.creation.default.cleanup.policy": "delete"
    }
  }

Sink Config:接收器配置:

{
  "name": "SimpleMySQLJDBC",
  "config": {
          "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
          "connection.url": "jdbc:mysql://mysql:3306/sinkdb",
          "connection.user": "user",
          "connection.password": "",
          "tasks.max": "2",
          "topics.regex": "corporation.dbo.*",
          "auto.create": "true",
          "auto.evolve": "true",
          "dialect.name": "MySqlDatabaseDialect",
          "insert.mode": "upsert",
          "pk.mode": "record_key",
          "pk.fields":"id",
          "delete.enabled": "true",
          "batch.size": 1,
          "key.converter":"io.confluent.connect.avro.AvroConverter",
          "key.converter.schema.registry.url": "http://schema-registry:8081",
          "value.converter": "io.confluent.connect.avro.AvroConverter",
          "value.converter.schema.registry.url": "http://schema-registry:8081",

          "transforms":"unwrap,dropPrefix,insertTS",

          "transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
          "transforms.dropPrefix.regex":"corporation.dbo.(.*)",
          "transforms.dropPrefix.replacement":"$1",

          "transforms.unwrap.type":"io.debezium.transforms.ExtractNewRecordState",
          "transforms.unwrap.drop.tombstones":"false",
          "transforms.unwrap.delete.handling.mode":"drop",

          "transforms.insertTS.type": "org.apache.kafka.connect.transforms.InsertField$Value",
          "transforms.insertTS.timestamp.field": "messageTS",

          "errors.log.enable": "true",
          "errors.log.include.messages": "true",
          "errors.tolerance":"all",
          "errors.deadletterqueue.topic.name":"dlq-mysql",
          "errors.deadletterqueue.context.headers.enable": "true",
          "errors.deadletterqueue.topic.replication.factor":"1"
      }
}

converts VARCHAR(255) to text将 VARCHAR(255) 转换为文本

The character limit of the fields is not carried through the Connect API datatypes.字段的字符限制不通过 Connect API 数据类型进行。 Any String-like data will become TEXT column types.任何类似字符串的数据都将成为TEXT列类型。

DATE to int, and TIME to bigint DATE 到 int,TIME 到 bigint

I think, by default, datetime values are converted into Unix epoch.我认为,默认情况下,日期时间值会转换为 Unix 纪元。 You can use the TimestampConverter transform to convert to a different format您可以使用TimestampConverter转换转换为不同的格式


Overall, if you want to accurately preserve types, disable the auto-creation of tables from the sink connector and pre-create tables with the types you want.总体而言,如果您想准确保留类型,请禁用从接收器连接器自动创建表并使用您想要的类型预先创建表。

You need to make 2 changes您需要进行 2 处更改
In Source Connector add "time.precision.mode":"connect"在源连接器中添加"time.precision.mode":"connect"
In sink connector add在接收器连接器中添加

"transforms": "TimestampConverter",
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.target.type": "Timestamp",
"transforms.TimestampConverter.field": "dob",

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka Connect JDBC接收器连接器 - Kafka Connect JDBC Sink Connector Kafka JDBC 接收器连接器 - 是否可以将主题数据作为 json 存储在 DB 中 - Kafka JDBC sink connector - is it possible to store the topic data as a json in DB Kafka JDBC Sink Connector,批量插入值 - Kafka JDBC Sink Connector, insert values in batches 带有 json 架构的 Kafka jdbc 接收器连接器不起作用 - Kafka jdbc sink connector with json schema not working kafka jdbc sink 连接器独立错误 - kafka jdbc sink connector standalone error 不支持的源数据类型:从 Kafka 主题消费时 JDBC Postgres Sink Connector 中的 STRUCT 错误 - Unsupported source data type: STRUCT error in JDBC Postgres Sink Connector when consuming from Kafka topic 在 Oracle 数据库中插入/更新数据时,JDBC Kafka Sink 连接器是否支持 Oracle 分区? - Does the JDBC Kafka Sink connector support Oracle partitioning when inserting / updating data in an Oracle database? kafka JDBC Sink Connector 是否会跟踪加载到目标数据库的数据? - Does kafka JDBC Sink Connector keeps track of data being loaded to destination database? Kafka JDBC Sink 连接器与 PostgreSQL 12 兼容吗? - Kafka JDBC Sink Connector compatibly works with PostgreSQL 12? 我需要 Kafka JDBC 接收器连接器中的更新或插入功能 - I need Update or Insert functionality in Kafka JDBC Sink Connector
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM