简体   繁体   中英

Order of receiving messages if Kafka consumer subscribes to multiple topics

I have a consumer that polls multiple topics. For this question, I've limited one partition per topic. Let's say by the time the consumer started polling, each topic has some data. What is the order of reads?

Is it round-robin? Is it read all from first before the next? I use consumer.poll(N) to poll.

There is no ordering, as the underlying protocol allows sending requests for multiple partitions in one request.

When you invoke consumer.poll(N) the client really sends FetchRequest objects to brokers that are hosting partition leaders (see org.apache.kafka.clients.consumer.internals.Fetcher.createFetchRequests() ) - and it's only one request per node, not per partition.

What is important is that the client can send one FetchRequest for multiple partitions (see protocol spec ).

The ordering is rather complicated. Here is how it works for Kafka 2.6:

  • when you assign topic partitions to a consumer, those will be kept in a hash table, therefore the order will be stable, but not necessarily the one you used
  • when you call Consumer.poll(N) it returns all the enqueued messages, but at most max.poll.records (see below)
  • when nothing is enqueued, all the topic partitions you assigned, are partitioned per Kafka node, where the leader of that topic-partition resides
  • each of those lists is sent to each respective nodes in a fetch request
  • each node will return at most fetch.max.bytes (or at least one message if available)
  • the node will fill those bytes with messages from the requested partitions, always starting with the first
  • if there are no more messages in the current partition left, but there are still bytes to fill, it will move to the next partition, until there are no more messages or the buffer is full
  • the node can also decide to stop using the current partition and continue with the next one, even if there are still messages available in the current one
  • after the client/consumer receives the buffer, it will split it into CompletedFetches , where one CompletedFetch contains exactly all the messages of one topic partition from the buffer
  • those CompletedFetches are enqueued (they may contain 0 message or 1000 or more). There will be one CompletedFetch for every requested topic partition
  • since all the requests to the nodes are run in parallel, but there is only one queue, the CompletedFetches /topic partitions may be mixed up in the final result as opposed to the original assignment order
  • the enqueued CompletedFetches are logically flattened into one big queue
  • Consumer.poll(N) will read and dequeue at most max.poll.records from that flattened big queue
  • before the records are returned to the caller of poll , another fetch request to all nodes is started, but this time, all the topic partitions that are already in the flattened queue are excluded
  • this holds for all future poll calls

In practice that means that you'll have no starving, but you may have a large number of messages from one topic, before you'll get a large number of messages for the next topic.

In tests with a message size of 10 bytes, there were around 58000 messages read from one topic, before roughly the same amount was read from the next. All topics were prefilled with 1 million messages.

Therefore you'll have a kind of batched round robin.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM