简体   繁体   中英

Most efficient number of threads in Kafka streams

I am using Kafka Streams with one topic(has 3 partitions).

I want to know most efficient number of thread numbers in Kafka Streams num.stream.threads option.

1 Thread and 3 tasks VS 3 Threads and 1 task(in each thread) Which one is better?

PS Server has 3 Core CPU.

The answer is, it depends! Typically, it will be more efficient to have as many threads as partitions/tasks as this will give you a better paralellism. But having too many threads can also be disastrous due to context switch if you don't have enought CPU.

You must also consider the throughput of the data to be processed, as well as the cost of the operation to perform on each record. If your stream application is not really data intensive you may not have interest to allocate a huge number of thread as they will be most of time idle.

It is therefore best to start with a single thread and perform load tests to measure the performance of your applications. For doing this, you can use the command-line tool available in the Apache kafka (or Confluent) distribution, ie, bin/kafka-producer-perf-test.sh and monitor the metrics exposed by Kafka Streams using JMX (see : Monitoring Kafka Streams - Confluent Documentation ).

Moreover, you should note that the maximum number of threads you can allocate to your application is not exactly equals to the number of partitions of the input topic you have declared in your topology. Actually, you should also consider all the topics from all the sub-topologies that have been generated by your application.

For example, let's say your are consuming a stream topic with 3 partitions, but your application perfom a repartition operation. Then, you will end up with two sub-topologies each consuming one topic with 3 partitions. So you will have a total of 6 tasks which means you can configure up to 6 threads.

Note: Usually, it is recommended to deploy a KafkaStreams instance with a single thread and to scale horizontally by adding more instance. This simplify the scaling model especially when using Kubernetes (ie 1 pod = 1 KafkaStreams instance = 1 Thread).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM