简体   繁体   中英

Kafka Streams - Inconsistency in number of messages in a topic

Let's suppose we have a Kafka topic that contains 1000 messages. We create a stream (we call it st in the following) out of it, and do the following:

int count = 0;

st.groupByKey().count().foreach((key, value) -> {

       count += value)
       System.out.println(count)
});

When the processing "ends", it returns a number slightly greater than 1000. What could cause this weird behavior ?

If some of your messages have the same key, your code is double counting them. Please note that the function passed to foreach() method on a KTable is not executed once per row, but rather once per update to a row (perhaps not every update due to caching). See: https://kafka.apache.org/11/javadoc/org/apache/kafka/streams/kstream/KTable.html#foreach-org.apache.kafka.streams.kstream.ForeachAction-

Perform an action on each updated record of this KTable. Note that this is a terminal operation that returns void.

Note that foreach() is not applied to the internal state store and only called for each new KTable updated record.

Imagine you have 3 messages with key "A". The KTable created by the count() aggregation will be updated 3 times and your function (lambda expression) will be called 3 times with the following parameters: ("A", 1), ("A", 2), ("A", 3) resulting in count being incremented by 1+2+3=6, instead of being incremented by 3.

KStream and KTable represent "data in motion" and their methods generally operate on the stream of data. If you would like to operate on the current snapshot of the data, consider using interactive queries instead. Possibly because the KTable.foreach method can be confusing at first, it has been deprecated with the following comment:

Deprecated. Use the Interactive Queries APIs (eg, KafkaStreams.store(String, QueryableStoreType) followed by ReadOnlyKeyValueStore.all()) to iterate over the keys of a KTable. Alternatively convert to a KStream using toStream() and then use foreach(action) on the result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM