简体繁体中英

What makes Kafka high in throughput?

原文 2017-06-19 02:16:02 2 2 apache-kafka

Most articles depicts Kafka better in read/write throughput than other message broker(MB) like ActiveMQ. Per mine understanding reading/writing with the help of offset makes it faster. But I am not clear how offset makes it faster ?

After reading Kafka architecture, I have got some understanding but not clear what makes Kafka scalable and high in throughput based on below points :-

Probably with the offset, client knows which exact message it needs to read which may be one of the factor to make it high in performance.
And in case of other MB's , broker need to coordinate among consumers so that message is delivered to only consumer. But this is the case for queues only not for topics. Then What makes Kafka topic faster than other MB's topic.
Kafka provides partitioning for scalability but other message broker(MB) like ActiveMQ also provides the clustering. so how Kafka is better for big data/high loads ?
In other MB's we can have listeners . So as soon as message comes, broker will deliver the message but in case of Kafka we need to poll which means more load on both broker/client side ?

2 answers

Lots of details on what makes Kafka different and faster than other messaging systems are in Jay Kreps blog post here

https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

There are actually a lot of differences that make Kafka perform well including but not limited to:

Maximized use of sequential disk reads and writes
Zero-copy processing of messages
Use of Linux OS page cache rather than Java heap for caching
Partitioning of topics across multiple brokers in a cluster
Smart client libraries that offload certain functions from the brokers
Batching of multiple published messages to yield less frequent network round trips to the broker
Support for multiple in-flight messages
Prefetching data into client buffers for faster subsequent requests.

It's largely marketing that Kafka is fast for a message broker. For example IBM MessageSight appliances did 13M msgs/sec with microsecond latency in 2013. On one machine. A year before Kreps even started the Github.: https://www.zdnet.com/article/ibm-launches-messagesight-appliance-aimed-at-m2m/

Kafka is good for a lot of things. True low latency messaging is not one of them. You flatly can't use batch delivery (eg a range of offsets) in any pure latency-centric environment. When an event arrives, delivery must be attempted immediately if you want the lowest latency. That doesn't mean waiting around for a couple seconds to batch read a block of events or enduring the overhead of requesting every message. Try using Kafka with an offset range of 1 (so: 1 message) if you want to compare it to a normal push-based broker and you'll see what I mean.

Instead, I recommend focusing on the thing pull-based stream buffering does give you:

Replayability!!!

Personally, I think this makes downstream data engineering systems a bit easier to build in the face of failure, particularly since you don't have to rely on their built-in replication models (if they even have one). For example, it's very easy for me to consume messages, lose the disks, restore the machine, and replay the lost data. The data streams become the single source of truth against which other systems can synchronize and this is exceptionally useful!!!

There's no free lunch in messaging, pull and push each have their advantages and disadvantages vs. each other. It might not surprise you that people have also tried push-pull messaging and it's no free lunch either :).

Kafka Streams rebalancing latency spikes on high throughput kafka-streams services

What exactly high throughput means in Data Science field

why does kafka producer (perf test) have such low throughput / high latency?

Increasing Kafka producer throughput

Kafka Partition and Throughput

Testing Kafka producer throughput

Increase the throughput of kafka

Measure processing throughput in Kafka Streams

Throughput Parameter in ProducerPerformance Tool in Kafka

Kafka Producer and Broker Throughput Limitations

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Kafka Streams rebalancing latency spikes on high throughput kafka-streams services What exactly high throughput means in Data Science field why does kafka producer (perf test) have such low throughput / high latency? Increasing Kafka producer throughput Kafka Partition and Throughput Testing Kafka producer throughput Increase the throughput of kafka Measure processing throughput in Kafka Streams Throughput Parameter in ProducerPerformance Tool in Kafka Kafka Producer and Broker Throughput Limitations

Related Tags

What makes Kafka high in throughput?

Question

2 answers

solution1
5 ACCPTED 2017-06-19 03:06:07

solution2
2 2019-03-05 17:52:31

What makes Kafka high in throughput?

Question

2 answers

solution1 5 ACCPTED 2017-06-19 03:06:07

solution2 2 2019-03-05 17:52:31

solution1
5 ACCPTED 2017-06-19 03:06:07

solution2
2 2019-03-05 17:52:31