简体   繁体   English

如何为kafka流使用同一主题的多个转换器?

[英]How to use multiple transformers using the same topic for kafka streams?

I need to parse complex messages on kafka using multiple transformers.我需要使用多个转换器解析 kafka 上的复杂消息。 Each transformer parses a part of the message and edits the message by filling some attributes on the message.每个转换器解析消息的一部分,并通过在消息上填充一些属性来编辑消息。 In the end the fully parsed message is stored in the database using a Kafka consumer.最后,完全解析的消息使用 Kafka 消费者存储在数据库中。 Currently, I'm doing this:目前,我正在这样做:

streamsBuilder.stream(Topic.A, someConsumer)
       \\ filters messages that have unparsed parts of type X
       .filter(filterX)
       \\ transformer that edits the message and produces new Topic.E messages
       .transform(ParseXandProduceE::new)
       .to(Topic.A, someProducer)

streamsBuilder.stream(Topic.A, someConsumer)
       \\ filters messages that have unparsed parts of type Y
       .filter(filterY)
       \\ transformer that edits the message and produces new Topic.F messages
       .transform(ParseYandProduceF::new)
       .to(Topic.A, someProducer)

a Transformer looks like:变压器看起来像:

class ParseXandProduceE implements Transformer<...> {
    @Override
    public KeyValue<String, Message> transform (String key, Message message) {
           message.x = parse(message.rawX);
           context.forward(newKey, message.x, Topic.E);
           return KeyValue.pair(key, message);
    }
}

However, this is cumbersome, the same messages flow multiple times through these streams.然而,这很麻烦,相同的消息在这些流中流动多次。 Additionally, there is a consumer that stores messages of topic.A in the database.此外,还有一个消费者将topic.A的消息存储在数据库中。 Messages are currently stored multiple times, before each transformation and after each transformation.消息当前存储多次,在每次转换之前和每次转换之后。 It is necessary to store each message once.有必要将每条消息存储一次。

The following could work, but seems unfavorable since each block of filter+transform could have been put cleanly in its own separate class:以下可能有效,但似乎不利,因为每个过滤器+转换块都可以干净地放在其自己的单独 class 中:

streamsBuilder.stream(Topic.A, someConsumer)
       \\ transformer that filters and edits the message and produces new Topic.E + Topic.F messages
       .transform(someTransformer)
       .to(Topic.B, someProducer)

and make the persistence consumer listen to Topic.B .并让持久性消费者听Topic.B

Is the latter proposed solution the way to go, or is there some other way to achieve the same result?后者提出的解决方案是go,还是有其他方法可以达到相同的结果? Maybe with a complete Topology configuration of Sources and Sinks?也许有源和汇的完整拓扑配置? If so, what would that look like for this scenario?如果是这样,这种情况会是什么样子?

Using a single transformer seems to be the simplest solution.使用单个变压器似乎是最简单的解决方案。 Because you have two independent filters, the program would become more complex if you try to chain individual operators.因为您有两个独立的过滤器,所以如果您尝试链接各个运算符,程序将变得更加复杂。 If you know that each message will only pass a single filter, but never both filters, you could use branch() :如果您知道每条消息只会通过一个过滤器,而不会同时通过两个过滤器,则可以使用branch()

KStream[] subStreams = stream.branch(new Predicates[]{filterX,filterY});

subStream[0].transform(ParseXandProduceE::new)
            .merge(subStream[1].transform(ParseYandProduceF::new)
            .to(...)

Note that the solution above only works if no message needs to be transformed by both transformers ( branch() puts every message into the branch of the first matching predicate, but never into multiple branches).请注意,上述解决方案仅在两个转换器都不需要转换任何消息时才有效( branch()将每条消息放入第一个匹配谓词的分支中,但绝不会放入多个分支中)。 Thus, if a message could pass both filters, you need do something like this that is more complicated:因此,如果一条消息可以通过两个过滤器,则您需要做一些更复杂的事情:

KStream[] subStreams = stream.branch(new Predicates[]{filterX,filterY});

KStream passedX = subStreams[0];
KStream transformedXE = passedX.transform(ParseXandProduceE::new);

// a message that passed filterX may also pass filterY,
// and thus we merge those message back to the "y-stream"
// (of course, those messages would already be transformed by `ParseXandProduceE`)
KStream passedY = subStream[1].merge(transformedXE.filter(filterY);

// the result contains all message that only pass filterX and got transformed,
// plus all messages that passed filterY (and maybe also filterX) and got transformed
KStream result = transformedXE.filterNot(filterY)
                              .merge(passedY.transform(ParseYandProduceF::new)

result.to(...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM