[英]Order of receiving messages if Kafka consumer subscribes to multiple topics
I have a consumer that polls multiple topics.我有一个消费者调查多个主题。 For this question, I've limited one partition per topic.
对于这个问题,我限制了每个主题一个分区。 Let's say by the time the consumer started polling, each topic has some data.
假设当消费者开始轮询时,每个主题都有一些数据。 What is the order of reads?
读取顺序是什么?
Is it round-robin?是循环吗? Is it read all from first before the next?
是从第一个读到下一个吗? I use
consumer.poll(N)
to poll.我使用
consumer.poll(N)
进行轮询。
There is no ordering, as the underlying protocol allows sending requests for multiple partitions in one request.没有排序,因为底层协议允许在一个请求中发送多个分区的请求。
When you invoke consumer.poll(N)
the client really sends FetchRequest
objects to brokers that are hosting partition leaders (see org.apache.kafka.clients.consumer.internals.Fetcher.createFetchRequests()
) - and it's only one request per node, not per partition.当您调用
consumer.poll(N)
,客户端确实将FetchRequest
对象发送到托管分区领导者的代理(请参阅org.apache.kafka.clients.consumer.internals.Fetcher.createFetchRequests()
) - 每个节点只有一个请求,不是每个分区。
What is important is that the client can send one FetchRequest for multiple partitions (see protocol spec ).重要的是客户端可以为多个分区发送一个 FetchRequest(参见 协议规范)。
The ordering is rather complicated.排序比较复杂。 Here is how it works for Kafka 2.6:
以下是 Kafka 2.6 的工作原理:
Consumer.poll(N)
it returns all the enqueued messages, but at most max.poll.records
(see below)Consumer.poll(N)
它返回所有排队的消息,但最多为max.poll.records
(见下文)fetch.max.bytes
(or at least one message if available)fetch.max.bytes
(或至少一条消息,如果可用)CompletedFetches
, where one CompletedFetch
contains exactly all the messages of one topic partition from the bufferCompletedFetches
,其中一个CompletedFetch
包含缓冲区中一个主题分区的所有消息CompletedFetches
are enqueued (they may contain 0 message or 1000 or more).CompletedFetches
入队(它们可能包含 0 条消息或 1000 条或更多)。 There will be one CompletedFetch
for every requested topic partitionCompletedFetch
CompletedFetches
/topic partitions may be mixed up in the final result as opposed to the original assignment orderCompletedFetches
/topic 分区可能会在最终结果中混淆CompletedFetches
are logically flattened into one big queueCompletedFetches
在逻辑上被压缩成一个大队列Consumer.poll(N)
will read and dequeue at most max.poll.records
from that flattened big queue Consumer.poll(N)
将从扁平化的大队列中读取最多max.poll.records
从队列中max.poll.records
poll
, another fetch request to all nodes is started, but this time, all the topic partitions that are already in the flattened queue are excludedpoll
的调用者之前,另一个对所有节点的 fetch 请求被启动,但这一次,已经在扁平队列中的所有主题分区都被排除在外poll
callspoll
In practice that means that you'll have no starving, but you may have a large number of messages from one topic, before you'll get a large number of messages for the next topic.在实践中,这意味着您不会饿死,但您可能会收到来自一个主题的大量消息,然后才会收到下一个主题的大量消息。
In tests with a message size of 10 bytes, there were around 58000 messages read from one topic, before roughly the same amount was read from the next.在消息大小为 10 字节的测试中,从一个主题读取了大约 58000 条消息,然后从下一个主题读取了大致相同的数量。 All topics were prefilled with 1 million messages.
所有主题都预先填充了 100 万条消息。
Therefore you'll have a kind of batched round robin.因此,您将拥有一种批处理循环。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.