简体   繁体   English

Zookeeper 如何从 __consumer_offsets 主题中检索消费者偏移量?

[英]How does Zookeeper retrive the consumer offsets from __consumer_offsets topic?

This is a followup question to " Where do zookeeper store Kafka cluster and related information? " based on the answer provided by Armando Ballaci.这是“ zookeeper 在哪里存储 Kafka 集群和相关信息? ”的后续问题,基于 Armando Ballaci 提供的答案。

Now it's clear that consumer offsets are stored in the Kafka cluster in a special topic called __consumer_offsets .现在很明显,消费者偏移量存储在 Kafka 集群中一个名为__consumer_offsets的特殊主题中。 That's fine, I am just wondering how does the retrieval of these offsets work.没关系,我只是想知道这些偏移量的检索是如何工作的。

Topics are not like RDBS over which we can query for arbitrary data based on a certain predicate.主题不像 RDBS,我们可以基于某个谓词查询任意数据。 Ex - if the data is stored in an RDBMS, probably a query like below will get the consumer offset for a particular partition of a topic for a particular consumer of some consumer group.例如 - 如果数据存储在 RDBMS 中,可能像下面这样的查询将获取某个消费者组的特定消费者的主题的特定分区的消费者偏移量。

select consumer_offset__read, consumer_offset__commited from consumer_offset_table where consumer-grp-id="x" and partitionid="y"

But clearly this kind of retrieval is not possible on Kafka Topics.但显然这种检索在 Kafka Topics 上是不可能的。 So how does the retrieval mechanism from topic work?那么从主题中检索机制是如何工作的呢? Could someone elaborate?有人可以详细说明吗?

(Data from Kafka partitions is read in FIFO, and if Kafka consumer model is followed to retrieve a particular offset, a lot of additional data has to be processed and it's going to be slow. So am wondering if it's done in some other way...) (来自 Kafka 分区的数据在 FIFO 中读取,如果遵循 Kafka 消费者 model 来检索特定偏移量,则必须处理大量额外数据并且速度会很慢。所以我想知道它是否以其他方式完成。 ..)

Some description I could find on web regarding the same when I stumbled upon this for my day job is as follows:当我在日常工作中偶然发现这一点时,我可以在 web 上找到一些描述如下:

In Kafka releases through 0.8.1.1, consumers commit their offsets to ZooKeeper.在 Kafka 到 0.8.1.1 的版本中,消费者将他们的偏移量提交给 ZooKeeper。 ZooKeeper does not scale extremely well (especially for writes) when there are a large number of offsets (ie, consumer-count * partition-count).当存在大量偏移量(即消费者计数 * 分区计数)时,ZooKeeper 不能很好地扩展(尤其是对于写入)。 Fortunately, Kafka now provides an ideal mechanism for storing consumer offsets.幸运的是,Kafka 现在提供了一种存储消费者偏移量的理想机制。 Consumers can commit their offsets in Kafka by writing them to a durable (replicated) and highly available topic.消费者可以通过将偏移量写入持久(复制)和高可用性主题来提交他们在 Kafka 中的偏移量。 Consumers can fetch offsets by reading from this topic (although we provide an in-memory offsets cache for faster access).消费者可以通过读取这个主题来获取偏移量(尽管我们提供了一个内存中的偏移量缓存以便更快地访问)。 ie, offset commits are regular producer requests (which are inexpensive) and offset fetches are fast memory look ups.即,偏移提交是常规的生产者请求(成本低廉),偏移获取是快速 memory 查找。

The official Kafka documentation describes how the feature works and how to migrate offsets from ZooKeeper to Kafka. Kafka 官方文档描述了该功能的工作原理以及如何将偏移量从 ZooKeeper 迁移到 Kafka。 This wiki provides sample code that shows how to use the new Kafka-based offset storage mechanism.这个 wiki 提供了示例代码,展示了如何使用新的基于 Kafka 的偏移存储机制。

try {
        BlockingChannel channel = new BlockingChannel("localhost", 9092,
                BlockingChannel.UseDefaultBufferSize(),
                BlockingChannel.UseDefaultBufferSize(),
                5000 /* read timeout in millis */);
        channel.connect();
        final String MY_GROUP = "demoGroup";
        final String MY_CLIENTID = "demoClientId";
        int correlationId = 0;
        final TopicAndPartition testPartition0 = new TopicAndPartition("demoTopic", 0);
        final TopicAndPartition testPartition1 = new TopicAndPartition("demoTopic", 1);
        channel.send(new ConsumerMetadataRequest(MY_GROUP, ConsumerMetadataRequest.CurrentVersion(), correlationId++, MY_CLIENTID));
        ConsumerMetadataResponse metadataResponse = ConsumerMetadataResponse.readFrom(channel.receive().buffer());
 
        if (metadataResponse.errorCode() == ErrorMapping.NoError()) {
            Broker offsetManager = metadataResponse.coordinator();
            // if the coordinator is different, from the above channel's host then reconnect
            channel.disconnect();
            channel = new BlockingChannel(offsetManager.host(), offsetManager.port(),
                                          BlockingChannel.UseDefaultBufferSize(),
                                          BlockingChannel.UseDefaultBufferSize(),
                                          5000 /* read timeout in millis */);
            channel.connect();
        } else {
            // retry (after backoff)
        }
    }
    catch (IOException e) {
        // retry the query (after backoff)
    }

In Kafka releases through 0.8.1.1, consumers commit their offsets to ZooKeeper.在 Kafka 到 0.8.1.1 的版本中,消费者将他们的偏移量提交给 ZooKeeper。 ZooKeeper does not scale extremely well (especially for writes) when there are a large number of offsets (ie, consumer-count * partition-count).当存在大量偏移量(即消费者计数 * 分区计数)时,ZooKeeper 不能很好地扩展(尤其是对于写入)。 Fortunately, Kafka now provides an ideal mechanism for storing consumer offsets.幸运的是,Kafka 现在提供了一种存储消费者偏移量的理想机制。 Consumers can commit their offsets in Kafka by writing them to a durable (replicated) and highly available topic.消费者可以通过将偏移量写入持久(复制)和高可用性主题来提交他们在 Kafka 中的偏移量。 Consumers can fetch offsets by reading from this topic (although we provide an in-memory offsets cache for faster access).消费者可以通过读取这个主题来获取偏移量(尽管我们提供了一个内存中的偏移量缓存以便更快地访问)。 ie, offset commits are regular producer requests (which are inexpensive) and offset fetches are fast memory look ups.即,偏移提交是常规的生产者请求(成本低廉),偏移获取是快速 memory 查找。

The official Kafka documentation describes how the feature works and how to migrate offsets from ZooKeeper to Kafka. Kafka 官方文档描述了该功能的工作原理以及如何将偏移量从 ZooKeeper 迁移到 Kafka。

The idea is that if you need such a functionality as you describe you need to store the data in a RDBS or a NoSQL database or an ELK Stack.这个想法是,如果您需要描述的功能,则需要将数据存储在 RDBS 或 NoSQL 数据库或 ELK 堆栈中。 A good pattern would be through Kafka Connect using a Sink connector.一个好的模式是使用 Sink 连接器通过 Kafka Connect。 The normal message processing in Kafka is done through Consummers or Stream Definitions that react on the Events as they come. Kafka 中的正常消息处理是通过消费者或 Stream 定义完成的,这些定义在事件发生时对其做出反应。 You can certainly seek to offset or timestamp in some cases and that is completely possible...在某些情况下,您当然可以寻求抵消或时间戳,这是完全可能的......

In the latest versions of Kafka the offsets are not kept in Zookeeper anymore.在最新版本的 Kafka 中,偏移量不再保存在 Zookeeper 中。 So Zookeeper is not involved in Consumer ofset handling.所以 Zookeeper 不参与消费者偏移处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM