简体   繁体   English

SQL 服务器 CDC - Debezium Kafka 特定日期的快照

[英]SQL Server CDC - Snapshot from specific date for Debezium Kafka

We are enabling CDC on specific tables in our MSSQL.我们正在 MSSQL 中的特定表上启用 CDC。 We are connecting to a pipeline of migrating data through MSSQL->CDC->DEBEZIUM->KAFKA_CONNECT我们正在通过MSSQL->CDC->DEBEZIUM->KAFKA_CONNECT连接到迁移数据的管道

There is a table that has more than a million rows, but we need only a few thousand rows from the table to be included in the Snapshot Created when enabling CDC.有一个表有超过一百万行,但我们只需要将表中的几千行包含在启用 CDC 时创建的快照中。 The reason why I don't want to handle it in our Kafka-Consumer is because, while I need just 1% of the data to be written to Mongo, rest 99% is gonna hit the consumer without any use.我不想在我们的 Kafka-Consumer 中处理它的原因是,虽然我只需要将 1% 的数据写入 Mongo,但 rest 99% 将毫无用处地打击消费者。


Questions:问题:

  1. Is it possible to create snapshot of specific rows/views while enabling CDC.是否可以在启用 CDC 时创建特定行/视图的快照。 I need rows which have column_value(modified date)>a specific date?我需要具有 column_value(修改日期)> 特定日期的行?
  2. Is this too much of micro-optimisation and I shall let everything come and hit the pipeline and be rejected by the consumer instead?这是不是太多的微优化,我会让一切都进入管道并被消费者拒绝?

You can use Kafka Connect Single Message Transform (SMT) .您可以使用Kafka Connect 单消息转换 (SMT) More precisely, you need the Filter SMT:更准确地说,您需要Filter SMT:

The filter.condition is a predicate specifying JSON path that is applied to each record processed, and when this predicate successfully matches the record is either included (when filter.type=include ) or excluded (when filter.type=exclude ) filter.condition是一个谓词,指定 JSON 路径,该路径应用于处理的每个记录,并且当此谓词成功匹配记录时,要么包括(当filter.type=include )或排除(当filter.type=exclude


In your case, you can include rows that satisfy your desired condition:在您的情况下,您可以包含满足所需条件的行:

transforms=filter-records
transforms.filterExample1.type=io.confluent.connect.transforms.Filter$Value
transforms.filterExample1.filter.condition=$.value[?(@.modified_date > "1/1/2020")]
transforms.filterExample1.filter.type=include
transforms.filterExample1.missing.or.null.behavior=fail

Alternatively, you can decide which rows to exclude :或者,您可以决定要排除哪些行:

transforms=filter-records
transforms.filterExample1.type=io.confluent.connect.transforms.Filter$Value
transforms.filterExample1.filter.condition=$.value[?(@.modified_date <= "1/1/2020")]
transforms.filterExample1.filter.type=exclude
transforms.filterExample1.missing.or.null.behavior=fail

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 成功创建 Always On SQL 服务器快照后,Debezium 未跟踪 CDC - Debezium not tracking CDC after successful snapshot of Always On SQL Server 在 sql 服务器 2019 中设置 cdc 并注册 debezium sql 服务器连接器后运行 kafka consumer 时无法生成任何日志 - Can't produce any logs when run kafka consumer after setup cdc in sql server 2019 and register debezium sql server connector Debezium更改数据捕获(CDC)在SQL Server 2017上不起作用 - Debezium Change Data Capture (CDC) not working on sql-server 2017 SQL 服务器连接失败 - Debezium - Kafka Connect - SQL Server Connection Failure - Debezium - Kafka Connect 带有 SQL Server 的 Debezium 从实际表中获取快照以及捕获表 - Debezium with SQL Server take snapshot from actual table as well as capture table CDC从SQL服务器到Oracle - CDC From SQL Server to Oracle 如何从debezium kafka connect收到的CDC事件中获取表名和数据库名 - How to get the table-name and database-name in the CDC event received from debezium kafka connect Debezium 连接器启动但未从 CDC 返回表 - Debezium connector starts but returns no tables from CDC 如何使用 Debezium 从 MS SQL 将 250 个表摄取到 Kafka - How to ingest 250 tables into Kafka from MS SQL with Debezium 如何从CDC删除列(在SQL Server中) - How to remove columns from CDC (in SQL Server)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM