[英]Kafka Streams committing just the latest message of KGroupedTable
I've got Kafka Streams application as follows: 我有Kafka Streams应用程序,如下所示:
static KafkaStreams build(AppConfig appConfig, SerdesHelper serdes) {
final KStreamBuilder builder = new KStreamBuilder();
builder
.table(serdes.sourceKeySerde, serdes.sourceValueSerde, appConfig.sourceTopic)
.groupBy(StreamBuilder::groupByMapper, serdes.intSerde, serdes.longSerde)
.aggregate(
StreamBuilder::initialize,
StreamBuilder::add,
StreamBuilder::subtract,
serdes.sinkValueSerde)
.to(serdes.intSerde, serdes.sinkValueSerde, appConfig.sinkTopic);
return new KafkaStreams(builder, appConfig.streamConfig);
}
My concrete example groups records as follows 我的具体示例将记录分组如下
((k, v)) -> ((k), v[])
And while running this with dummy data of 3.000.000 messages having only two unique keys, I ended up having about 10.000 messages in sinkTopic
in less than a minute and I hoped to get either 4/6 (based on the moment I manage to stop the application). 当使用只有两个唯一键的3.000.000条消息的伪数据运行此消息时,我在不到一分钟的时间内接收到了大约10.000条消息在
sinkTopic
中,我希望得到4/6(基于我设法停止的那一刻)应用程序)。
How can I ensure that only the key with the latest grouped value will be committed back to Kafka instead of every intermediate message? 如何确保仅将具有最新分组值的密钥提交回Kafka,而不是每个中间消息?
It's stream processing, not batch processing. 它是流处理,而不是批处理。 There is no "latest grouped value" -- the input is infinite, and thus, the output is infinite...
没有“最新分组值”-输入是无限的,因此输出是无限的...
You can only reduce the number of intermediates by 您只能通过减少中间体数量
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.