简体   繁体   中英

Concurrency of Kafka streams topology with multiple output topics

Given a Kafka streams topology which publishes messages to two different topics, are there any guarantees in which order various steps will be executed in these two branches or are those branches separated completely and executed in parallel?

Example

    KStream<..., ...> filteredStream = builder.stream("input-topic", ...).filter(...)...;

    filteredStream.mapValues(this::mapOne).to("output-topic-one", ...);
    filteredStream.flatMap(this::mapTwo).to("output-topic-two", ...);

In this example, will mapOne executed and publishing to output-topic- one be done before mapTwo is even getting called or messages are published to output-topic- two ? In other words, is there a guarantee that mapOne will be finished before messages are published to output-topic- two ?

Topology Visualization

When looking at the visualization of the topology description (see at the bottom; made with https://zz85.github.io/kafka-streams-viz/ ) you can see the two branches. But you can also see these numbers in each bubble which might also indicate that there is an order of execution (1-4, then 5-6-7, then 8-9).

kafka流拓扑

Topology Description

Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-0000000000 (topics: [input-topic])
      --> KSTREAM-FILTER-0000000001
    Processor: KSTREAM-FILTER-0000000001 (stores: [])
      --> KSTREAM-FILTER-0000000002
      <-- KSTREAM-SOURCE-0000000000
    Processor: KSTREAM-FILTER-0000000002 (stores: [])
      --> KSTREAM-MAP-0000000003
      <-- KSTREAM-FILTER-0000000001
    Processor: KSTREAM-MAP-0000000003 (stores: [])
      --> KSTREAM-FILTER-0000000004
      <-- KSTREAM-FILTER-0000000002
    Processor: KSTREAM-FILTER-0000000004 (stores: [])
      --> KSTREAM-MAPVALUES-0000000005, KSTREAM-FLATMAP-0000000008
      <-- KSTREAM-MAP-0000000003
    Processor: KSTREAM-MAPVALUES-0000000005 (stores: [])
      --> KSTREAM-FILTER-0000000006
      <-- KSTREAM-FILTER-0000000004
    Processor: KSTREAM-FILTER-0000000006 (stores: [])
      --> KSTREAM-SINK-0000000007
      <-- KSTREAM-MAPVALUES-0000000005
    Processor: KSTREAM-FLATMAP-0000000008 (stores: [])
      --> KSTREAM-SINK-0000000009
      <-- KSTREAM-FILTER-0000000004
    Sink: KSTREAM-SINK-0000000007 (topic: output-topic-one)
      <-- KSTREAM-FILTER-0000000006
    Sink: KSTREAM-SINK-0000000009 (topic: output-topic-two)
      <-- KSTREAM-FLATMAP-0000000008

Kafka streams always guarantee the Topology order. It is always passing the message in a topology, that topology has edges and nodes. Those edges and nodes added to the topology as you define it in the application.

In your case filtered stream go through the map values branch in the topology until that path is end (in your case sink -> topic one).

Then it continue with flat map branch . until the sink to topic two.

It is ordered correctly with that IDs.

0000000004 -> 0000000005 -> 0000000006 -> 0000000007

0000000004 -> 0000000008 -> 0000000009

For more information go through the Kafka source code internal topology builder

And refer this

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM