简体   繁体   中英

Kafka Streams Flow Of Control

I have a basic question regarding the flow of control in a kafka stream application. If there are two source topics A & B. Lets suppose that A has records with timestamps that are earlier than B. Is there a guarantee of the order in which the records would be processed by the streaming application?

I did a very rudimentary test and peeked at the records when they were getting consumed and printed the instant at which they were being processed via a simple sout of Instant.now

KStream<String, String> akStream= builder.stream("A",
        Consumed.with(Serdes.String(), Serdes.String()).withOffsetResetPolicy(Topology.AutoOffsetReset.EARLIEST))
        .peek((s, string) -> System.out.println("Topic A at " + Instant.now() ));

KStream<String, String> bkStream= builder.stream("B",
        Consumed.with(Serdes.String(), Serdes.String()))
        .peek((s, string) -> System.out.println("Topic B " + Instant.now()));

These are the begin and end timestamps for the records in the topics

A : 2020-03-27 14:36:04 (epoch: 1585316164843) 2020-03-27 14:34:02 (epoch: 1585316042569)
B : 2020-03-30 11:04:17 (epoch: 1585559057167) 2020-03-17 14:44:38 (epoch: 1584452678527)

Topic B records get picked up before Topic A. Sysout shows all records form topic B Can someone help in understanding this ? I would like to use this understanding when writing streaming application with multiple input sources.

Thanks in advance

The way you have build your streams, each stream exists alone for itself, there is no ordering guarntee.

With regards of processing the records based on timestamp. This you can do only within a timewindow. For example if you have two topic A and B you can join them and withing a timewindow you can order the events.

<VO,VR> KStream<K,VR> join​(KStream<K,VO> otherStream,
                           ValueJoiner<? super V,? super VO,? extends VR> joiner,
                           JoinWindows windows)

It depends. In general, there are not guarantees about processing order between different topics. There is one exception though: if a single task processes data from different topics, than records will be processed in timestamp order. However, it's a best effort approach; as of Kafka Streams 2.3, those ordering guarantees got improved and you can influence them use max.task.idle.ms configuration.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM