简体   繁体   English

Kafka Streams 控制流

[英]Kafka Streams Flow Of Control

I have a basic question regarding the flow of control in a kafka stream application.我有一个关于 kafka 流应用程序中控制流的基本问题。 If there are two source topics A & B. Lets suppose that A has records with timestamps that are earlier than B. Is there a guarantee of the order in which the records would be processed by the streaming application?如果有两个源主题 A 和 B。假设 A 的记录的时间戳早于 B。流应用程序处理记录的顺序是否有保证?

I did a very rudimentary test and peeked at the records when they were getting consumed and printed the instant at which they were being processed via a simple sout of Instant.now我做了一个非常基本的测试,并在它们被消耗时偷看记录,并通过 Instant.now 的简单输出打印它们被处理的瞬间

KStream<String, String> akStream= builder.stream("A",
        Consumed.with(Serdes.String(), Serdes.String()).withOffsetResetPolicy(Topology.AutoOffsetReset.EARLIEST))
        .peek((s, string) -> System.out.println("Topic A at " + Instant.now() ));

KStream<String, String> bkStream= builder.stream("B",
        Consumed.with(Serdes.String(), Serdes.String()))
        .peek((s, string) -> System.out.println("Topic B " + Instant.now()));

These are the begin and end timestamps for the records in the topics这些是主题中记录的开始和结束时间戳

A : 2020-03-27 14:36:04 (epoch: 1585316164843) 2020-03-27 14:34:02 (epoch: 1585316042569)
B : 2020-03-30 11:04:17 (epoch: 1585559057167) 2020-03-17 14:44:38 (epoch: 1584452678527)

Topic B records get picked up before Topic A. Sysout shows all records form topic B Can someone help in understanding this ?主题 B 记录在主题 A 之前被拾取。Sysout 显示来自主题 B 的所有记录 有人可以帮助理解这一点吗? I would like to use this understanding when writing streaming application with multiple input sources.在编写具有多个输入源的流应用程序时,我想使用这种理解。

Thanks in advance提前致谢

The way you have build your streams, each stream exists alone for itself, there is no ordering guarntee.您构建流的方式,每个流都单独存在,没有排序保证。

With regards of processing the records based on timestamp.关于基于时间戳处理记录。 This you can do only within a timewindow.您只能在一个时间窗口内执行此操作。 For example if you have two topic A and B you can join them and withing a timewindow you can order the events.例如,如果您有两个主题 A 和 B,您可以加入它们,并使用时间窗口对事件进行排序。

<VO,VR> KStream<K,VR> join​(KStream<K,VO> otherStream,
                           ValueJoiner<? super V,? super VO,? extends VR> joiner,
                           JoinWindows windows)

It depends.这取决于。 In general, there are not guarantees about processing order between different topics.一般情况下,不保证不同主题之间的处理顺序。 There is one exception though: if a single task processes data from different topics, than records will be processed in timestamp order.但是有一个例外:如果单个任务处理来自不同主题的数据,则记录将按时间戳顺序处理。 However, it's a best effort approach;然而,这是一种尽力而为的方法; as of Kafka Streams 2.3, those ordering guarantees got improved and you can influence them use max.task.idle.ms configuration.从 Kafka Streams 2.3 开始,这些排序保证得到了改进,您可以使用max.task.idle.ms配置来影响它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM