简体繁体 English

Kafka 消费者订阅多个主题时接收消息的顺序

[英]Order of receiving messages if Kafka consumer subscribes to multiple topics

原文 2018-11-05 05:50:28 1 2 apache-kafka/ kafka-consumer-api

I have a consumer that polls multiple topics.我有一个消费者调查多个主题。 For this question, I've limited one partition per topic.对于这个问题，我限制了每个主题一个分区。 Let's say by the time the consumer started polling, each topic has some data.假设当消费者开始轮询时，每个主题都有一些数据。 What is the order of reads?读取顺序是什么？

Is it round-robin?是循环吗？ Is it read all from first before the next?是从第一个读到下一个吗？ I use consumer.poll(N) to poll.我使用consumer.poll(N)进行轮询。

2 个解决方案

There is no ordering, as the underlying protocol allows sending requests for multiple partitions in one request.没有排序，因为底层协议允许在一个请求中发送多个分区的请求。

When you invoke consumer.poll(N) the client really sends FetchRequest objects to brokers that are hosting partition leaders (see org.apache.kafka.clients.consumer.internals.Fetcher.createFetchRequests() ) - and it's only one request per node, not per partition.当您调用consumer.poll(N) ，客户端确实将FetchRequest对象发送到托管分区领导者的代理（请参阅org.apache.kafka.clients.consumer.internals.Fetcher.createFetchRequests() ） - 每个节点只有一个请求，不是每个分区。

What is important is that the client can send one FetchRequest for multiple partitions (see protocol spec ).重要的是客户端可以为多个分区发送一个 FetchRequest（参见协议规范）。

The ordering is rather complicated.排序比较复杂。 Here is how it works for Kafka 2.6:以下是 Kafka 2.6 的工作原理：

when you assign topic partitions to a consumer, those will be kept in a hash table, therefore the order will be stable, but not necessarily the one you used当您将主题分区分配给消费者时，它们将保存在哈希表中，因此顺序将是稳定的，但不一定是您使用的顺序
when you call Consumer.poll(N) it returns all the enqueued messages, but at most max.poll.records (see below)当您调用Consumer.poll(N)它返回所有排队的消息，但最多为max.poll.records （见下文）
when nothing is enqueued, all the topic partitions you assigned, are partitioned per Kafka node, where the leader of that topic-partition resides当没有任何内容排队时，您分配的所有主题分区都会按每个 Kafka 节点进行分区，该主题分区的领导者所在的位置
each of those lists is sent to each respective nodes in a fetch request这些列表中的每一个都在获取请求中发送到每个相应的节点
each node will return at most fetch.max.bytes (or at least one message if available)每个节点最多返回fetch.max.bytes （或至少一条消息，如果可用）
the node will fill those bytes with messages from the requested partitions, always starting with the first节点将用来自请求分区的消息填充这些字节，始终从第一个开始
if there are no more messages in the current partition left, but there are still bytes to fill, it will move to the next partition, until there are no more messages or the buffer is full如果当前分区中没有更多消息，但仍有字节要填充，它将移动到下一个分区，直到没有更多消息或缓冲区已满
the node can also decide to stop using the current partition and continue with the next one, even if there are still messages available in the current one节点也可以决定停止使用当前分区并继续下一个分区，即使当前分区中仍有可用消息
after the client/consumer receives the buffer, it will split it into CompletedFetches , where one CompletedFetch contains exactly all the messages of one topic partition from the buffer客户端/消费者收到缓冲区后，将其拆分为CompletedFetches ，其中一个CompletedFetch包含缓冲区中一个主题分区的所有消息
those CompletedFetches are enqueued (they may contain 0 message or 1000 or more).那些CompletedFetches入队（它们可能包含 0 条消息或 1000 条或更多）。 There will be one CompletedFetch for every requested topic partition每个请求的主题分区都会有一个CompletedFetch
since all the requests to the nodes are run in parallel, but there is only one queue, the CompletedFetches /topic partitions may be mixed up in the final result as opposed to the original assignment order由于对节点的所有请求都是并行运行的，但只有一个队列，因此与原始分配顺序相反， CompletedFetches /topic 分区可能会在最终结果中混淆
the enqueued CompletedFetches are logically flattened into one big queue入队的CompletedFetches在逻辑上被压缩成一个大队列
Consumer.poll(N) will read and dequeue at most max.poll.records from that flattened big queue Consumer.poll(N)将从扁平化的大队列中读取最多max.poll.records从队列中max.poll.records
before the records are returned to the caller of poll , another fetch request to all nodes is started, but this time, all the topic partitions that are already in the flattened queue are excluded在记录返回给poll的调用者之前，另一个对所有节点的 fetch 请求被启动，但这一次，已经在扁平队列中的所有主题分区都被排除在外
this holds for all future poll calls这适用于所有未来的poll

In practice that means that you'll have no starving, but you may have a large number of messages from one topic, before you'll get a large number of messages for the next topic.在实践中，这意味着您不会饿死，但您可能会收到来自一个主题的大量消息，然后才会收到下一个主题的大量消息。

In tests with a message size of 10 bytes, there were around 58000 messages read from one topic, before roughly the same amount was read from the next.在消息大小为 10 字节的测试中，从一个主题读取了大约 58000 条消息，然后从下一个主题读取了大致相同的数量。 All topics were prefilled with 1 million messages.所有主题都预先填充了 100 万条消息。

Therefore you'll have a kind of batched round robin.因此，您将拥有一种批处理循环。