简体   繁体   中英

Audit data changes with Debezium

I have a use case where I want to audit the DB table data changes into another table for compliance purposes. Primarily, any changes to the data like Inserts/Updates/Deletes should be audited. I found different options like JaVers , Hibernate Envers , Database triggers , and Debezium .

I am avoiding using JaVers, and Hibernate Envers as this will not capture any data change that happens through direct SQL queries and any data change that happens through other applications. The other issue I see is we need to add the audit-related code to the main application code in the same transaction boundary.

I am also avoiding the usage of database triggers as we are not using triggers at all for any of the deployments.

Then I left with Debezium which is promising. But, the only concern that I have is that we need to use Kafka to leverage Debezium. Is Kafka's usage is necessary to use Debezium if both the primary table and the audit table sit in the same DB instance?

Debezium is perfect for auditing, but given it is a source Connector, it represents just one part of the data pipeline in your use case. You will capture every table change event (c=create, r=read, u=update, d=delete), store it on a Kafka topic or local disk and then you need a Sink Connector (ie Camel Kafka SQL or JDBC, kafka-connect-jdbc) to insert into the target table.

For the same transaction boundary requirement you can use the Outbox Pattern if the eventual consistency is fine. There is also an Outbox Event Router SMT component that is part of the project.

Note that Debezium can also run embedded in a standalone Java application , storing the offset on local disk, but you lose the HA capability given by KafkaConnect running in distributed mode. With the embedded mode, you are also swtiching from a configuration-driven approach to a code-driven one.

I found Debezium to be a very comprehensive solution, and it is open source backed by Redhat. That gives it not only the credibility, but the security that it is going to be supported.

It provides a rich configuration to whitelist, blacklist databases/tables/columns (with wild card patterns), along with controls to limit data in really large columns.

Since it is driven from BinLogs, you not only get the current state, you also get the previous state. This is ideal for audit trails, and you can customize building a proper Sync to elastic topics (one for table).

Use of Kafka is necessary to account for HA and latency when bulk updates are made on DB, even though Primary and Audit tables are in same DB.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM