简体   繁体   English

使用 Kafka Streams 将状态持久化到 Kafka

[英]Persisting state into Kafka using Kafka Streams

I am trying to wrap my head around Kafka Streams and having some fundamental questions that I can't seem to figure out on my own.我正在尝试围绕 Kafka Streams 进行思考,并且有一些我似乎无法自己解决的基本问题。 I understand the concept of a KTable and Kafka State Stores but am having trouble deciding how to approach this.我了解KTable和 Kafka State Stores 的概念,但我无法决定如何解决这个问题。 I am also using Spring Cloud Streams, which adds another level of complexity on top of this.我也在使用 Spring Cloud Streams,它在此基础上增加了另一个级别的复杂性。

My use case:我的用例:

I have a rule engine that reads in a Kafka event, processes the event, returns a list of rules that matched and writes it into another topic.我有一个规则引擎,它读入 Kafka 事件,处理事件,返回匹配的规则列表并将其写入另一个主题。 This is what I have so far:这是我到目前为止:

@Bean
public Function<KStream<String, ProcessNode>, KStream<String, List<IndicatorEvaluation>>> process() {
    return input -> input.mapValues(this::analyze).filter((host, evaluation) -> evaluation != null);
}

public List<IndicatorEvaluation> analyze(final String host, final ProcessNode process) {
    // Does stuff
}

Some of the stateful rules look like:一些有状态的规则如下所示:

[some condition] REPEATS 5 TIMES WITHIN 1 MINUTE
[some condition] FOLLOWEDBY [some condition] WITHIN 1 MINUTE
[rule A exists and rule B exists]

My current implementation is storing all this information in memory to be able to perform the analysis.我当前的实现是将所有这些信息存储在内存中,以便能够执行分析。 For obvious reasons, it is not easily scalable.由于显而易见的原因,它不容易扩展。 So I figured I would persist this into a Kafka State Store.所以我想我会把它保存到 Kafka State Store 中。

I am unsure of the best way to go about it.我不确定最好的方法。 I know there is a way to create custom state stores that allow for a higher level of flexibility.我知道有一种方法可以创建允许更高级别灵活性的自定义状态存储。 I'm not sure if the Kafka DSL will support this.我不确定 Kafka DSL 是否会支持这一点。

Still new to Kafka Streams and wouldn't mind hearing a variety of suggestions.对 Kafka Streams 还是新手,不介意听到各种建议。

From the description you have given, I believe this use case can still be implemented using the DSL in Kafka Streams.从你给出的描述来看,我相信这个用例仍然可以使用 Kafka Streams 中的 DSL 来实现。 The code you have shown above does not track any state.您上面显示的代码不跟踪任何状态。 In your topology, you need to add state by tracking the counts of the rules and store them in a state store.在您的拓扑中,您​​需要通过跟踪规则计数来添加状态并将它们存储在状态存储中。 Then you only need to send the output rules when that count hits a threshold.然后,您只需要在该计数达到阈值时发送输出规则。 Here is the general idea behind this as a pseudo-code.这是作为伪代码背后的一般思想。 Obviously, you have to tweak this to satisfy the particular specifications of your use case.显然,您必须对其进行调整以满足用例的特定规范。

@Bean
public Function<KStream<String, ProcessNode>, KStream<String, List<IndicatorEvaluation>>> process() {
    return input -> input
                     .mapValues(this::analyze)
                     .filter((host, evaluation) -> evaluation != null)
                     ...
                     .groupByKey(...)
                     .windowedBy(TimeWindows.of(Duration.ofHours(1)))
                     .count(Materialized.as("rules"))
                     .filter((key, value) -> value > 4)
                     .toStream()
                    ....
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM