简体   繁体   English

偏移量存储在 Zookeeper 或 Kafka 中?

[英]Offsets stored in Zookeeper or Kafka?

I'm a bit confused about where offsets are stored when using Kafka and Zookeeper.在使用 Kafka 和 Zookeeper 时,我对存储偏移量的位置有些困惑。 It seems like offsets in some cases are stored in Zookeeper, in other cases they are stored in Kafka.在某些情况下,偏移量似乎存储在 Zookeeper 中,而在其他情况下,它们存储在 Kafka 中。

What determines whether the offset is stored in Kafka or in Zookeeper?什么决定偏移量是存储在Kafka还是Zookeeper中? And what the pros and cons?以及利弊是什么?

NB: Of course I could also store the offset on my own in some different data store but that is not part of the picture for this post.注意:当然,我也可以将偏移量自己存储在一些不同的数据存储中,但这不是本文图片的一部分。

Some more details about my setup:有关我的设置的更多详细信息:

  • I run these versions: KAFKA_VERSION="0.10.1.0", SCALA_VERSION="2.11"我运行这些版本:KAFKA_VERSION="0.10.1.0", SCALA_VERSION="2.11"
  • I connect to Kafka/Zookeeper using kafka-node from my NodeJS application.我使用 NodeJS 应用程序中的 kafka-node 连接到 Kafka/Zookeeper。

Older versions of Kafka (pre 0.9) store offsets in ZK only, while newer version of Kafka, by default store offsets in an internal Kafka topic called __consumer_offsets (newer version might still commit to ZK though).较旧版本的 Kafka(0.9 之前)仅在 ZK 中存储偏移量,而较新版本的 Kafka 默认将偏移量存储在名为__consumer_offsets的内部 Kafka 主题中( __consumer_offsets较新版本可能仍会提交到 ZK)。

The advantage of committing offsets to the broker is, that the consumer does not depend on ZK and thus clients only need to talk to brokers which simplifies the overall architecture.向代理提交偏移量的优点是,消费者不依赖于 ZK,因此客户端只需要与代理交谈,从而简化了整体架构。 Also, for large deployments with a lot of consumers, ZK can become a bottleneck while Kafka can handle this load easily (committing offsets is the same thing as writing to a topic and Kafka scales very well here -- in fact, by default __consumer_offsets is created with 50 partitions IIRC).此外,对于拥有大量消费者的大型部署,ZK 可能成为瓶颈,而 Kafka 可以轻松处理此负载(提交偏移量与写入主题相同,并且 Kafka 在这里扩展得很好——事实上,默认情况下__consumer_offsets是创建了 50 个分区 IIRC)。

I am not familiar with NodeJS or kafka-node -- it depend on the client implementation how offsets are committed.我不熟悉 NodeJS 或 kafka-node——它取决于客户端实现如何提交偏移量。

Long story short: if you use brokers 0.10.1.0 you could commit offsets to topic __consumer_offsets .长话短说:如果您使用 brokers 0.10.1.0您可以向主题__consumer_offsets提交偏移量。 But it depends on your client, if it implements this protocol.但这取决于您的客户端,如果它实现了这个协议。

In more detail, it depends on your broker and client version (and which consumer API you are using), because older clients can talk to newer brokers.更详细地说,这取决于您的代理和客户端版本(以及您使用的消费者 API),因为旧客户端可以与新代理通信。 First, you need to have broker and client version 0.9 or larger to be able to write offsets into the Kafka topics.首先,您需要拥有0.9或更高版本的代理和客户端才能将偏移量写入 Kafka 主题。 But if an older client is connecting to a 0.9 broker, it will still commit offsets to ZK.但是如果旧客户端连接到0.9代理,它仍然会向 ZK 提交偏移量。

For Java consumers:对于 Java 消费者:

It depends what consumer are using: Before 0.9 there are two "old consumer" namely "high level consumer" and "low level consumer".这取决于消费者使用的是什么:在0.9之前有两个“老消费者”,即“高级消费者”和“低级消费者”。 Both, commit offsets directly to ZK.两者都直接向 ZK 提交偏移量。 Since 0.9 , both consumers got merged into single consumer, called "new consumer" (it basically unifies low level and high level API of both old consumers -- this means, in 0.9 there a three types of consumers).0.9 ,两个消费者合并为一个消费者,称为“新消费者”(它基本上统一了旧消费者的低级和高级 API——这意味着,在0.9有三种类型的消费者)。 The new consumer commits offset to the brokers (ie, the internal Kafka topic)新消费者向brokers提交offset(即Kafka内部topic)

To make upgrading easier, there is also the possibility to "double commit" offsets using old consumer (as of 0.9 ).为了使升级更容易,还可以使用旧消费者(从0.9 )“双重提交”偏移量。 If you enable this via dual.commit.enabled , offsets are committed to ZK and the __consumer_offsets topic.如果您通过dual.commit.enabled启用此dual.commit.enabled ,则偏移量将提交给 ZK 和__consumer_offsets主题。 This allows you to switch from old consumer API to new consumer API while moving you offsets from ZK to __consumer_offsets topic.这允许您从旧的消费者 API 切换到新的消费者 API,同时将偏移量从 ZK 移动到__consumer_offsets主题。

It all depends on which consumer you're using.这完全取决于您使用的消费者。 You should choose the right consumer based on your Kafka version.您应该根据您的 Kafka 版本选择合适的消费者。

for version 0.8 brokers use the HighLevelConsumer .对于0.8版经纪人,请使用HighLevelConsumer The offsets for your groups are stored in zookeeper.您的组的偏移量存储在 zookeeper 中。

For brokers 0.9 and higher you should use the new ConsumerGroup .对于0.9及更高版本的代理,您应该使用新的ConsumerGroup The offsets are stored with kafka brokers.偏移量存储在 kafka 代理中。

Keep in mind that HighLevelConsumer will still work with versions past 0.8 but they have been deprecated in 0.10.1 and support will probably go away soon.请记住, HighLevelConsumer仍适用于 0.8 以后的版本,但它们已在0.10.1弃用,并且支持可能很快就会消失。 The ConsumerGroup has rolling migration options to help move from HighLevelConsumer if you were committed to using it.如果您承诺使用它, ConsumerGroup具有滚动迁移选项,可帮助您从HighLevelConsumer迁移。

Offsets in Kafka are stored as messages in a separate topic named '__consumer_offsets' . Kafka 中的偏移量作为消息存储在名为 '__consumer_offsets' 的单独主题中。 Each consumer commits a message into the topic at periodic intervals in latest versions of kafka.在最新版本的 kafka 中,每个消费者定期向主题提交一条消息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM