简体   繁体   English

使用多个输入主题时,输入主题的不同消息率会影响kafka流处理速度吗?

[英]When using multiple input topics, does different message rate on input topics affect kafka streams processing speed?

I have 10 input topics (a topic per mysql table) that I am reading from, in my kafka streams app.在我的 kafka 流应用程序中,我有 10 个正在读取的输入主题(每个 mysql 表的主题)。 Certain topics have very low message rate while others have slightly higher message rate.某些主题的消息率非常低,而其他主题的消息率略高。 Rarely a couple of topics can have a surge of messages.很少有几个主题会产生大量消息。 Wonder if kafka streams processing on faster topics will be stalled due to low message rate on slower topics?想知道对于较快主题的 kafka 流处理是否会由于较慢主题的低消息率而停止? And if I should create separate source nodes in the topology to isolate slower topics from faster topics.如果我应该在拓扑中创建单独的源节点以将较慢的主题与较快的主题隔离开来。

My streams app extracts information from input message, goes to another service to get more data and writes it to an output kafka topic.我的流应用程序从输入消息中提取信息,转到另一个服务以获取更多数据并将其写入 output kafka 主题。

It depends...这取决于...

If different topics are processed by different sub-topologies (cf the output of Topology#describe() ) than each topic is processed individually and there no impact if different topics have a different data rate.如果不同的主题由不同的子拓扑处理(参见Topology#describe()的 output),则每个主题将单独处理,并且如果不同的主题具有不同的数据速率,则不会产生影响。

If you join or merge multiple topics (and thus, they are processed by the same sub-topology), than the progressing of the topics "coupled".如果您加入或合并多个主题(因此,它们由相同的子拓扑处理),则主题的进展会“耦合”。 This coupling is base on event timestamps .这种耦合基于事件时间戳 Thus, a topic with higher data rate most likely has "denser" record timestamps and thus gets more data processed than the topic with lower data rate.因此,具有较高数据速率的主题很可能具有“更密集”的记录时间戳,因此比具有较低数据速率的主题获得更多的数据处理。 For example:例如:

// just showing timestamps
topic-1 (partition-0): 3 13 23 33 43 53 63 73 83 93 103 113...
topic-2 (partition-0):  5              55             105

processing order:
3 5 13 23 33 43 53 55 63 73 83 93 103 105 113

Hence, for each record of topic-2, 5 records of topic-1 would be processed.因此,对于 topic-2 的每条记录,将处理 topic-1 的 5 条记录。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 logstash kafka使用不同的编解码器输入了多个主题 - logstash kafka input multiple topics with different codecs Kafka 流应用程序重置 globalTable 和输入主题 - Kafka streams application reset globalTable and input topics 具有不同主题 ApplicationId 的多个 Kafka 流 - Multiple Kafka Streams with different topics ApplicationId 无论如何对kafka流应用程序中的不同输入主题使用不同的auto.offset.reset策略? - Is there anyway to use different auto.offset.reset strategy for different input topics in kafka streams app? Kafka Streams:在发布到不同主题之前如何转换消息 - Kafka Streams: How to transform message before publishing to different topics Kafka:生产不同消息速率的数千个主题的吞吐量 - Kafka: Throughput of producing to thousands of topics with different message rate 当输入主题有数据保留期时,启动新的 Kafka Streams 微服务 - Starting new Kafka Streams microservice, when there is data retention period on input topics Kafka - 使用多个主题时出现 InvalidTopicException - Kafka - InvalidTopicException when using multiple topics 分支到多个主题时的 Kafka Streams 架构问题 - Kafka Streams Schema Issue when Branching to Multiple Topics 使用 Kafka Streams 从多个主题中累积事件 - Accumulate Events from Multiple Topics using Kafka Streams
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM