简体   繁体   English

Kafka 流中最有效的线程数

[英]Most efficient number of threads in Kafka streams

I am using Kafka Streams with one topic(has 3 partitions).我将 Kafka Streams 用于一个主题(有 3 个分区)。

I want to know most efficient number of thread numbers in Kafka Streams num.stream.threads option.我想知道 Kafka Streams num.stream.threads 选项中最有效的线程数。

1 Thread and 3 tasks VS 3 Threads and 1 task(in each thread) Which one is better? 1 个线程和 3 个任务VS 3 个线程和 1 个任务(在每个线程中)哪个更好?

PS Server has 3 Core CPU. PS 服务器有 3 核 CPU。

The answer is, it depends!答案是,视情况而定! Typically, it will be more efficient to have as many threads as partitions/tasks as this will give you a better paralellism.通常,拥有与分区/任务一样多的线程会更有效,因为这将为您提供更好的并行性。 But having too many threads can also be disastrous due to context switch if you don't have enought CPU.但是,如果您没有足够的 CPU,由于上下文切换,线程过多也可能是灾难性的。

You must also consider the throughput of the data to be processed, as well as the cost of the operation to perform on each record.您还必须考虑要处理的数据的吞吐量,以及对每条记录执行的操作成本。 If your stream application is not really data intensive you may not have interest to allocate a huge number of thread as they will be most of time idle.如果您的流应用程序不是真正的数据密集型,您可能没有兴趣分配大量线程,因为它们大部分时间都处于空闲状态。

It is therefore best to start with a single thread and perform load tests to measure the performance of your applications.因此,最好从单个线程开始并执行负载测试来衡量应用程序的性能。 For doing this, you can use the command-line tool available in the Apache kafka (or Confluent) distribution, ie, bin/kafka-producer-perf-test.sh and monitor the metrics exposed by Kafka Streams using JMX (see : Monitoring Kafka Streams - Confluent Documentation ).为此,您可以使用 Apache kafka(或 Confluent)发行版中提供的命令行工具,即bin/kafka-producer-perf-test.sh并使用 JMX监控Kafka Streams 公开的指标(请参阅:监控Kafka Streams - Confluent 文档)。

Moreover, you should note that the maximum number of threads you can allocate to your application is not exactly equals to the number of partitions of the input topic you have declared in your topology.此外,您应该注意,您可以分配给应用程序的最大线程数并不完全等于您在拓扑中声明的输入主题的分区数。 Actually, you should also consider all the topics from all the sub-topologies that have been generated by your application.实际上,您还应该考虑应用程序生成的所有子拓扑中的所有主题。

For example, let's say your are consuming a stream topic with 3 partitions, but your application perfom a repartition operation.例如,假设您正在使用具有 3 个分区的流主题,但您的应用程序执行了重新分区操作。 Then, you will end up with two sub-topologies each consuming one topic with 3 partitions.然后,您将最终得到两个子拓扑,每个子拓扑使用一个具有 3 个分区的主题。 So you will have a total of 6 tasks which means you can configure up to 6 threads.因此,您总共将有 6 个任务,这意味着您最多可以配置 6 个线程。

Note: Usually, it is recommended to deploy a KafkaStreams instance with a single thread and to scale horizontally by adding more instance.注意:通常,建议使用单线程部署 KafkaStreams 实例,并通过添加更多实例进行水平扩展。 This simplify the scaling model especially when using Kubernetes (ie 1 pod = 1 KafkaStreams instance = 1 Thread).这简化了扩展模型,尤其是在使用 Kubernetes 时(即 1 pod = 1 KafkaStreams 实例 = 1 线程)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM