简体   繁体   中英

Do FKs(foreign keys) present a problem if we set up RDBMS replication through Apache Kafka?

I am wondering if Apache Kafka can be used for a fault tolerant environment with relational databases, one source and several, replicated through kafka, instances.

在此处输入图片说明 I am new to Kafka, and a lot of sources in the internet say that this can be easily done with Kafka Connect, but there are several aspects of this problem that I never find any explanation for:

How can we guarantee that no foreign key will be violated during the replication process? I have seen connectors that send data changes to a separate kafka topic per each table in the database, but how do we read them in the same order they were created so that FK is not violated during replication? Even if we put all changes in a single topic, this topic might be partitioned and then how are we going to read them in the same order? Does this mean that we may only use single topic with single partition? Or maybe we should remove all FK constraints in the target databases and never care about their integrity?

I do feel that it is inappropriate to have a relational database for read only purpose but there are a lot of legacy clients for it that we cannot afford to rewrite all at once.

I am currently in a project that use CDC (Change Data Capture) on RDBMS databases.

In my case, the CDC write into a single topic for one table in the database and the number of partitions is effectively 1 (to ensure that all the message are coming in order).

Unfortunately yes, the integrity of the FK constraint is not guaranteed in Kafka. By that I mean if the integrity is valid in the database, so it will be on Kafka but there isn't a validation system that check FKs constraints (you can join with a field that's not an FK with Kafka Streams).

With Kafka Stream you can do "joins" operation of course but you will have to know the FK constraints on the source database to do some valid business code.

EDIT: Of course you can consume every topics that the CDC write on and produce into an other topic with more partitions you will then redistribute the data like you want (with even a new schema).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM