Kafka Streams KTable Store not useful in this case for a compacted input topic, alternatives?

Question

I am modeling an event sourcing application and come across a conceptual doubt, I'll use a typical shopping domain to show it:

Suppose a customer topic that receives events of the following kinds:

CustomerCreated id = x, name= xxx, address = xxx
CustomerUpdated id = x, name = xxx
CustomerUpdated id = x, address = xxx

Notice that the update events don't necessarily change/inform all customer fields.

I am materializing this topic using a KTable and using its store to run interactive queries:

KTable<Integer, Customer> customers = builder.table(Topics.CUSTOMER.keySerde(), Topics.CUSTOMER.valueSerde(), Topics.CUSTOMER_STORE.name());

Suppose the there will be a lot of customers and I would like to use a compacted customer topic. This wouldn't work for recovery since compacted topics take an intermediate message and in my case this message could not contain the whole information of the customer (it could be an update event with partial info).

According to the javadoc for KStreamBuilder.table the created KTable store is not a change log so it is recovered from the original topic.

The resulting KTable will be materialized in a local KeyValueStore with the given storeName. However, no internal changelog topic is created since the original input topic can be used for recovery

In my case how can I have a compacted topic for customers and at the same time have a store created from the topic that can be recovered with the full information of the client?

Answer 1

As you noted correctly, your input topic cannot be compacted, because each update record is interpreted as an overwrite to a previous one and thus must be a "full" update ("partial" updates are not supported by changelog topics).

Reading a topic as a KTable follows the same semantic and would materialize the topic with "put" operations into a key-value-store (with tombstones executed as deletes).

If you want to do partial updates using Kafka Streams, you can use an aggregation though by reading the input topic as a KStream :

KTable table = builder.stream(...).groupByKey().aggregate(...);

This allows you to use a custom Aggregator that can perform partial updates. For each input record, you get the old/current state and the current input record (ie, potential partial update) and the Aggregator returns the new (updated) state. This gives you maximum flexibility and you can update the state as you wish.

The input topic does not need to be compacted for this case. The result KTable will be backed by a changelog topic that contains update record with a full copy of the state. This changelog topic will be configured with log compaction automatically and thus will never loose its state.

You can also write the resulting changelog topic into an output topic that should be configured with log compaction:

table.toStream().to(...);

You might want to disable caching in the aggregation step via parameter Materialized . See the docs for more details: https://docs.confluent.io/current/streams/developer-guide/memory-mgmt.html

Kafka Streams KTable Store not useful in this case for a compacted input topic, alternatives?

Question

1 answers

solution1
9 ACCPTED 2018-08-21 04:20:32

Kafka Streams KTable Store not useful in this case for a compacted input topic, alternatives?

Question

1 answers

solution1 9 ACCPTED 2018-08-21 04:20:32

solution1
9 ACCPTED 2018-08-21 04:20:32