SQL 服务器 CDC - Debezium Kafka 特定日期的快照

Question

We are enabling CDC on specific tables in our MSSQL.我们正在 MSSQL 中的特定表上启用 CDC。 We are connecting to a pipeline of migrating data through MSSQL->CDC->DEBEZIUM->KAFKA_CONNECT我们正在通过MSSQL->CDC->DEBEZIUM->KAFKA_CONNECT连接到迁移数据的管道

There is a table that has more than a million rows, but we need only a few thousand rows from the table to be included in the Snapshot Created when enabling CDC.有一个表有超过一百万行，但我们只需要将表中的几千行包含在启用 CDC 时创建的快照中。 The reason why I don't want to handle it in our Kafka-Consumer is because, while I need just 1% of the data to be written to Mongo, rest 99% is gonna hit the consumer without any use.我不想在我们的 Kafka-Consumer 中处理它的原因是，虽然我只需要将 1% 的数据写入 Mongo，但 rest 99% 将毫无用处地打击消费者。

Questions:问题：

Is it possible to create snapshot of specific rows/views while enabling CDC.是否可以在启用 CDC 时创建特定行/视图的快照。 I need rows which have column_value(modified date)>a specific date?我需要具有 column_value（修改日期）> 特定日期的行？
Is this too much of micro-optimisation and I shall let everything come and hit the pipeline and be rejected by the consumer instead?这是不是太多的微优化，我会让一切都进入管道并被消费者拒绝？

Answer 1

You can use Kafka Connect Single Message Transform (SMT) .您可以使用Kafka Connect 单消息转换 (SMT) 。 More precisely, you need the Filter SMT:更准确地说，您需要Filter SMT：

The filter.condition is a predicate specifying JSON path that is applied to each record processed, and when this predicate successfully matches the record is either included (when filter.type=include ) or excluded (when filter.type=exclude ) filter.condition是一个谓词，指定 JSON 路径，该路径应用于处理的每个记录，并且当此谓词成功匹配记录时，要么包括（当filter.type=include ）或排除（当filter.type=exclude ）

In your case, you can include rows that satisfy your desired condition:在您的情况下，您可以包含满足所需条件的行：

transforms=filter-records
transforms.filterExample1.type=io.confluent.connect.transforms.Filter$Value
transforms.filterExample1.filter.condition=$.value[?(@.modified_date > "1/1/2020")]
transforms.filterExample1.filter.type=include
transforms.filterExample1.missing.or.null.behavior=fail

Alternatively, you can decide which rows to exclude :或者，您可以决定要排除哪些行：

transforms=filter-records
transforms.filterExample1.type=io.confluent.connect.transforms.Filter$Value
transforms.filterExample1.filter.condition=$.value[?(@.modified_date <= "1/1/2020")]
transforms.filterExample1.filter.type=exclude
transforms.filterExample1.missing.or.null.behavior=fail

SQL 服务器 CDC - Debezium Kafka 特定日期的快照

问题描述

1 个解决方案

解决方案1
1 2020-06-11 12:27:45

SQL 服务器 CDC - Debezium Kafka 特定日期的快照

问题描述

1 个解决方案

解决方案1 1 2020-06-11 12:27:45

解决方案1
1 2020-06-11 12:27:45