简体   繁体   中英

When using multiple input topics, does different message rate on input topics affect kafka streams processing speed?

I have 10 input topics (a topic per mysql table) that I am reading from, in my kafka streams app. Certain topics have very low message rate while others have slightly higher message rate. Rarely a couple of topics can have a surge of messages. Wonder if kafka streams processing on faster topics will be stalled due to low message rate on slower topics? And if I should create separate source nodes in the topology to isolate slower topics from faster topics.

My streams app extracts information from input message, goes to another service to get more data and writes it to an output kafka topic.

It depends...

If different topics are processed by different sub-topologies (cf the output of Topology#describe() ) than each topic is processed individually and there no impact if different topics have a different data rate.

If you join or merge multiple topics (and thus, they are processed by the same sub-topology), than the progressing of the topics "coupled". This coupling is base on event timestamps . Thus, a topic with higher data rate most likely has "denser" record timestamps and thus gets more data processed than the topic with lower data rate. For example:

// just showing timestamps
topic-1 (partition-0): 3 13 23 33 43 53 63 73 83 93 103 113...
topic-2 (partition-0):  5              55             105

processing order:
3 5 13 23 33 43 53 55 63 73 83 93 103 105 113

Hence, for each record of topic-2, 5 records of topic-1 would be processed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM