如何从 kafka 流获取窗口聚合？

Question

I have a stream of events which I would like to aggregate based on time windows.我有一个事件流，我想根据时间窗口聚合这些事件。 My solution gives incremental aggregation rather than giving aggregation on timed window.我的解决方案提供增量聚合，而不是在定时窗口上进行聚合。 I have read that this is normal for stream as it would give results as change-log.我已经读过这对于流来说是正常的，因为它会将结果作为更改日志。 Also during research I have came across 2 step windowed aggregation with Kafka Streams DSL and How to send final kafka-streams aggregation result of a time windowed KTable?同样在研究期间，我遇到了 Kafka Streams DSL 的 2 步窗口聚合以及如何发送时间窗口 KTable 的最终 kafka-streams 聚合结果？ . . But solution in first post somewhat outdated (using deprecated API).但是第一篇文章中的解决方案有些过时（使用已弃用的 API）。 I used new API which are suggested in those deprecated API.我使用了那些已弃用的 API 中建议的新 API。 This is my solution,这是我的解决方案

KStream<String, Event> eventKStream = summarizableData.mapValues(v -> v.getEvent());
    KGroupedStream<String, Event> kGroupedStream = eventKStream.groupBy((key, value) -> {
             String groupBy = getGroupBy(value, criteria);
             return groupBy;
    }, Serialized.with(Serdes.String(), eventSerde));


    long windowSizeMs = TimeUnit.SECONDS.toMillis(applicationProperties.getWindowSizeInSeconds());
    final TimeWindowedKStream<String, Event> groupedByKeyForWindow = kGroupedStream
            .windowedBy(TimeWindows.of(windowSizeMs)
                    .advanceBy(windowSizeMs));

But my results, as I have explained earlier, not given in specific time windows but given as a incremental aggregation.但是，正如我之前所解释的，我的结果不是在特定时间窗口中给出的，而是作为增量聚合给出的。 I need my data to output as specified time given in windowSize.我需要我的数据按照 windowSize 中给定的时间输出。 Also I read that CACHE_MAX_BYTES_BUFFERING_CONFIG can control the output but I need somewhat solid solution works for every scenario.我还读到CACHE_MAX_BYTES_BUFFERING_CONFIG可以控制输出，但我需要一些可靠的解决方案适用于每种情况。 Also note that patterns given in https://cwiki.apache.org/confluence/display/KAFKA/Windowed+aggregations+over+successively+increasing+timed+windows wiki is now outdated as it uses old APIs.另请注意， https: //cwiki.apache.org/confluence/display/KAFKA/Windowed+aggregations+over+successively+increasing+timed+windows wiki 中给出的模式现在已经过时，因为它使用旧的 API。 (I'm using kafka-streams 1.1.0 version) （我使用的是 kafka-streams 1.1.0 版本）

Answer 1

The problem was my mistake.问题是我的错误。 Above, the code sample works fine.以上，代码示例工作正常。 But at the end I have converted the KTable to KStream .但最后我已将KTable转换为KStream 。 That was the problem.那就是问题所在。 Converting to KStream causes to output intermediate results as well.转换为KStream导致输出中间结果。 The pattern given in https://cwiki.apache.org/confluence/display/KAFKA/Windowed+aggregations+over+successively+increasing+timed+windows works fine. https://cwiki.apache.org/confluence/display/KAFKA/Windowed+aggregations+over+successively+increasing+timed+windows 中给出的模式工作正常。 By problematic code was,通过有问题的代码是，

// Aggregation

KTable<Windowed<String>, Event> results = groupedByKeyForWindow.aggregate(new AggregateInitiator(), new EventAggregator());

// This converstion causing changelog to output. Instead use next line.
KStream<String, AggregationMessage> aggregationMessageKStream = results.toStream((key, value) -> key.toString())
                .mapValues(this::convertToAggregationMessage).filter((k, v) -> v != null);

// output KTable to sample topic. But this output controlled by 
// COMMIT_INTERVAL_MS_CONFIG and CACHE_MAX_BYTES_BUFFERING_CONFIG parameters. 
// I'm using default values for these params.
results.to(windowedSerde, eventSerde,  "Sample");

如何从 kafka 流获取窗口聚合？

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-06-19 06:00:57

如何从 kafka 流获取窗口聚合？

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-06-19 06:00:57

解决方案1
4 已采纳 2018-06-19 06:00:57