简体   繁体   English

Kafka-Streams 加入 2 个带有 JSON 值的主题 | 背压机制?

[英]Kafka-Streams Join 2 topics with JSON values | backpressure mechanism?

I'm learning Kafka Streams and try to achieve the following:我正在学习 Kafka Streams 并尝试实现以下目标:

Created 2 Kafka topics(say topic1, topic2) with null as key and JSONString as value.创建了 2 个 Kafka 主题(比如 topic1、topic2),以 null 作为键,JSONString 作为值。 Data from topic1(no duplicates) have multiple matching entries in topic2.来自主题 1(无重复)的数据在主题 2 中有多个匹配条目。 Ie topic1 has some mainstream data to generate new multiple data-stream when joined with topic2.即 topic1 有一些主流数据,当与 topic2 连接时,可以生成新的多数据流。

Example:例子:

topic1={"name": "abc", "age":2}, {"name": "xyz", "age":3} and so on.
topic2={"name": "abc", "address"="xxxxxx"}, {"name": "abc", "address"="yyyyyy"}, {"name": "xyz", "address"="jjjjjj"}, {"name": "xyz", "address"="xxxkkkkk"}

Expected Output: {"name": "abc", "age":2, "address"="xxxxxx"}, {"name": "abc", "age":2, "address"="yyyyyy"}, {"name": "xyz", "age":3, "address"="jjjjjj"}, {"name": "xyz", "age":3, "address"="xxxkkkkk"}预期输出: {"name": "abc", "age":2, "address"="xxxxxx"}, {"name": "abc", "age":2, "address"="yyyyyy"}, {"name": "xyz", "age":3, "address"="jjjjjj"}, {"name": "xyz", "age":3, "address"="xxxkkkkk"}

Would like to persist/hold data-stream from topic1 for future references, while data-stream from topic2 is just used to achieve the above said use-case and doesn't require any persistence/holding back.想保留/保留来自 topic1 的数据流以供将来参考,而来自 topic2 的数据流仅用于实现上述用例,不需要任何持久性/保留。

I have few questions: 1) Should hold/store topic1 data stream for few days(possible?) so that incoming data stream from topic2 could be joined.我有几个问题:1)应该将 topic1 数据流保留/存储几天(可能?),以便可以加入来自 topic2 的传入数据流。 Is it possible?是否可以? 2) What should I use to achieve this, KStream or KTable? 2)我应该用什么来实现这一点,KStream 还是 KTable? 3) Is this called backpressure mechanism? 3)这叫背压机制吗?

Does Kafka Stream support this use-case or should I look out for something else? Kafka Stream 是否支持此用例,还是我应该注意其他事项? Plese, suggest.拜托,建议。

I have tried a piece of code with KStream with 5 min window but looks like I'm not able to hold topic1 data in the stream.我已经尝试了一段带有 5 分钟窗口的 KStream 代码,但看起来我无法在流中保存 topic1 数据。

Please help me with the right choice and join.请帮助我做出正确的选择并加入。 I'm using Kafka from Confluent with Docker instance.我正在使用 Confluent 中的 Kafka 和 Docker 实例。

public void run() {
        final StreamsBuilder builder = new StreamsBuilder();
        final Serde<JsonNode> jsonSerde = Serdes.serdeFrom(new JsonSerializer(), new JsonDeserializer());
        final Consumed<String, JsonNode> consumed = Consumed.with(Serdes.String(), jsonSerde);

        // Hold data from this topic to 30 days
        KStream<String, JsonNode> cs = builder.stream("topic1", consumed);
        cs.foreach((k,v) -> {
            System.out.println( k + " --->" + v);
        });

        // Data is involved in one time process.
        KStream<String, JsonNode> css = builder.stream("topic2", consumed);
        css.foreach((k,v) -> {
            System.out.println( k + " --->" + v);
        });

        KStream<String, JsonNode> resultStream = cs.leftJoin(css,
                valueJoiner,
                JoinWindows.of(TimeUnit.MINUTES.toMillis(5)),
                Joined.with(
                        Serdes.String(), /* key */
                        jsonSerde,       /* left value */
                        jsonSerde)       /* right value */
        );

        resultStream.foreach((k, v) -> {
            System.out.println("JOIN STREAM: KEY="+k+ ", VALUE=" + v);
        });

        KafkaStreams streams = new KafkaStreams(builder.build(), properties);
        streams.start();
    }

Joins in Kafka are always based on keys. Kafka 中的联接始终基于键。 (*) Thus, to make any join work, you need to extract the fields you want to join on into the key before you do the actual join (the only partial exception would be KStream-GlobalKTable join). (*)因此,要进行任何连接,您需要在进行实际连接之前将要连接的字段提取到键中(唯一的部分例外是 KStream-GlobalKTable 连接)。 In your code example, you won't get any results because all records have a null key and cannot be joined for this reason.在您的代码示例中,您不会得到任何结果,因为所有记录都有一个null键,因此无法加入。

For the join itself, it seems that a KStream-KTable join would be right choice for your use case.对于连接本身,似乎 KStream-KTable 连接将是您的用例的正确选择。 To make this work, you will need to:要完成这项工作,您需要:

  1. set the join key correctly for topic1 and write the data into an additional topic (let's call it topic1Keyed )topic1正确设置连接键并将数据写入附加主题(我们称之为topic1Keyed
  2. read topic1Keyed as a table阅读topic1Keyed作为表
  3. set the join key correctly for topic2topic2正确设置连接键
  4. join topic2 with the KTabletopic2KTable连接KTable

For full details about join semantics, check out this blog post: https://www.confluent.io/blog/crossing-streams-joins-apache-kafka/有关连接语义的完整详细信息,请查看此博客文章: https : //www.confluent.io/blog/crossing-streams-joins-apache-kafka/

(*) UPDATE: (*) 更新:

Since the 2.4 release, Kafka Streams also support foreign key table-table joins.从 2.4 版本开始,Kafka Streams 也支持外键表-表连接。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM