简体   繁体   English

Zookeeper是Kafka必备的吗? [关闭]

[英]Is Zookeeper a must for Kafka? [closed]

In Kafka, I would like to use only a single broker, single topic and a single partition having one producer and multiple consumers (each consumer getting its own copy of data from the broker).在 Kafka 中,我只想使用单个代理、单个主题和具有一个生产者和多个消费者的单个分区(每个消费者从代理获取自己的数据副本)。 Given this, I do not want the overhead of using Zookeeper;鉴于此,我不想使用 Zookeeper 的开销; Can I not just use the broker only?我不能只使用经纪人吗? Why is a Zookeeper must?为什么必须要有 Zookeeper?

Yes, Zookeeper is required for running Kafka. 是的,运行Kafka需要Zookeeper。 From the Kafka Getting Started documentation: 从Kafka入门文档:

Step 2: Start the server 第2步:启动服务器

Kafka uses zookeeper so you need to first start a zookeeper server if you don't already have one. Kafka使用zookeeper,因此如果您还没有动物园管理员服务器,则需要先启动它。 You can use the convenience script packaged with kafka to get a quick-and-dirty single-node zookeeper instance. 您可以使用与kafka一起打包的便捷脚本来获取快速且脏的单节点zookeeper实例。

As to why, well people long ago discovered that you need to have some way to coordinating tasks, state management, configuration, etc across a distributed system. 至于为什么,很久以前人们发现你需要有一些方法来协调分布式系统中的任务,状态管理,配置等。 Some projects have built their own mechanisms (think of the configuration server in a MongoDB sharded cluster, or a Master node in an Elasticsearch cluster). 一些项目已经构建了自己的机制(想想MongoDB分片集群中的配置服务器,或Elasticsearch集群中的主节点)。 Others have chosen to take advantage of Zookeeper as a general purpose distributed process coordination system. 其他人选择利用Zookeeper作为通用的分布式过程协调系统。 So Kafka, Storm, HBase, SolrCloud to just name a few all use Zookeeper to help manage and coordinate. 所以Kafka,Storm,HBase,SolrCloud只是命名一些都使用Zookeeper来帮助管理和协调。

Kafka is a distributed system and is built to use Zookeeper. Kafka是一个分布式系统,用于使用Zookeeper。 The fact that you are not using any of the distributed features of Kafka does not change how it was built. 您没有使用Kafka的任何分布式功能这一事实并没有改变它的构建方式。 In any event there should not be much overhead from using Zookeeper. 无论如何,使用Zookeeper不应该有太多开销。 A bigger question is why you would use this particular design pattern -- a single broker implementation of Kafka misses out on all of the reliability features of a multi-broker cluster along with it's ability to scale. 更大的问题是为什么要使用这种特殊的设计模式--Kafka的单个代理实现错过了多代理集群的所有可靠性功能以及它的扩展能力。

As explained by others, Kafka (even in most recent version) will not work without Zookeeper. 正如其他人所解释的那样,没有Zookeeper,Kafka(即使在最新版本中)也无法运行。

Kafka uses Zookeeper for the following: Kafka使用Zookeeper进行以下操作:

Electing a controller . 选择一个控制器 The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions. 控制器是代理之一,负责维护所有分区的领导者/关注者关系。 When a node shuts down, it is the controller that tells other replicas to become partition leaders to replace the partition leaders on the node that is going away. 当节点关闭时,控制器会告诉其他副本成为分区负责人,以替换正在消失的节点上的分区负责人。 Zookeeper is used to elect a controller, make sure there is only one and elect a new one it if it crashes. Zookeeper用于选择控制器,确保只有一个控制器,如果它崩溃,则选择一个新控制器。

Cluster membership - which brokers are alive and part of the cluster? 集群成员资格 - 哪些经纪人还活着并成为集群的一部分? this is also managed through ZooKeeper. 这也是通过ZooKeeper管理的。

Topic configuration - which topics exist, how many partitions each has, where are the replicas, who is the preferred leader, what configuration overrides are set for each topic 主题配置 - 存在哪些主题,每个分区有多少个分区,副本在哪里,谁是首选领导者,为每个主题设置了哪些配置覆盖

(0.9.0) - Quotas - how much data is each client allowed to read and write (0.9.0) - 配额 - 允许每个客户端读取和写入多少数据

(0.9.0) - ACLs - who is allowed to read and write to which topic (old high level consumer) - Which consumer groups exist, who are their members and what is the latest offset each group got from each partition. (0.9.0) - ACL - 允许读取和写入哪个主题(旧的高级别消费者) - 存在哪些消费者组,谁是其成员以及每个组从每个分区获得的最新偏移量。

[from https://www.quora.com/What-is-the-actual-role-of-ZooKeeper-in-Kafka/answer/Gwen-Shapira ] [来自https://www.quora.com/What-is-the-actual-role-of-ZooKeeper-in-Kafka/answer/Gwen-Shapira ]

Regarding your scenario, only one broker instance and one producer with multiple consumer, u can use pusher to create a channel, and push event to that channel that consumer can subscribe to and hand those events. 关于您的场景,只有一个代理实例和一个具有多个消费者的生产者,您可以使用pusher创建一个渠道,并将事件推送到消费者可以订阅的那个渠道并交出这些事件。 https://pusher.com/ https://pusher.com/

Updated on Oct 2022 2022 年 10 月更新

For new clusters in the 3.3 release you can use Apache Kafka without ZooKeeper (in new mode, called KRaft mode) in production.对于 3.3 版本中的新集群,您可以在生产中使用不带 ZooKeeper 的 Apache Kafka(在新模式下,称为 KRaft 模式)。

Apache Kafka Raft ( KRaft ) is the consensus protocol that was introduced to remove Apache Kafka's dependency on ZooKeeper for metadata management. Apache Kafka Raft( KRaft )是为了消除 Apache Kafka 对 ZooKeeper 元数据管理的依赖而引入的共识协议。 The development progress is tracked in KIP-500 .KIP-500中跟踪开发进度。

KRaft mode was released in early access in Kafka 2.8. KRaft 模式在 Kafka 2.8 的抢先体验中发布。 It was not suitable for production before 3.3 version (see details in KIP-833: Mark KRaft as Production Ready ) 3.3 版本之前不适合生产(详见KIP-833: Mark KRaft as Production Ready


一只忙碌的猫

1. Benefits of Kafka's new quorum controller 1. Kafka新增quorum controller的好处

  1. Enables Kafka clusters to scale to millions of partitions through improved control plane performance with the new metadata management通过使用新的元数据管理改进控制平面性能,使 Kafka 集群能够扩展到数百万个分区
  2. Improves stability, simplifies the software, and makes it easier to monitor, administer, and support Kafka.提高稳定性,简化软件,并使其更容易监控、管理和支持 Kafka。
  3. Allows Kafka to have a single security model for the whole system允许 Kafka 对整个系统有一个单一的安全 model
  4. Provides a lightweight, single process way to get started with Kafka提供一种轻量级、单一进程的方式来开始使用 Kafka
  5. Makes controller failover near-instantaneous使 controller 故障转移接近瞬时

2. Timeline Note: this timeline is very rough and subject to change. 2. 时间线注意:这个时间线非常粗略,可能会发生变化。

在此处输入图像描述

  • 2022/10: KRaft mode declared production-ready in Kafka 3.3 2022/10:KRaft 模式在 Kafka 3.3 中宣布生产就绪
  • 2023/02: Upgrade from ZK mode supported in Kafka 3.4 as early access. 2023/02:从 Kafka 3.4 支持的 ZK 模式升级为早期访问。
  • 2023/04: Kafka 3.5 released with both KRaft and ZK support. 2023/04:Kafka 3.5 发布,同时支持 KRaft 和 ZK。 Upgrade from ZK goes production.从 ZK 升级到生产环境。 ZooKeeper mode deprecated. ZooKeeper 模式已弃用。
  • 2023/10: Kafka 4.0 released with only KRaft mode supported. 2023/10:Kafka 4.0 发布,仅支持 KRaft 模式。

References:参考:

  1. KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum KIP-500:用自我管理的元数据仲裁替换 ZooKeeper
  2. Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency Apache Kafka 不需要 Keeper:移除 Apache ZooKeeper 依赖
  3. Preparing Your Clients and Tools for KIP-500: ZooKeeper Removal from Apache Kafka 为 KIP-500 准备客户端和工具:从 Apache Kafka 移除 ZooKeeper
  4. KRaft: Apache Kafka Without ZooKeeper KRaft: Apache 没有 ZooKeeper 的 Kafka

Kafka is built to use Zookeeper. Kafka是为了使用Zookeeper而构建的。 There is no escaping from that. 没有逃避这一点。

Kafka is a distributed system and uses Zookeeper to track status of kafka cluster nodes. Kafka是一个分布式系统,使用Zookeeper跟踪kafka集群节点的状态。 It also keeps track of Kafka topics, partitions etc. 它还跟踪Kafka主题,分区等。

Looking at your question, it seems you do not need Kafka. 看看你的问题,看来你不需要卡夫卡。 You can use any application that supports pub-sub such as Redis , Rabbit MQ or hosted solutions such as Pub-nub . 您可以使用任何支持pub-sub的应用程序,如Redis ,Rabbit MQ或托管解决方案,如Pub-nub

IMHO Zookeeper is not an overhead but makes your life a lot easier. 恕我直言,动物园管理员不是一个开销,但让你的生活更轻松。

It is basically used to maintain co-ordination between different nodes in a cluster. 它主要用于维护集群中不同节点之间的协调。 One of the most important things for Kafka is it uses zookeeper to periodically commit offsets so that in case of node failure it can resume from the previously committed offset (imagine yourself taking care of all this by your own). Kafka最重要的事情之一是它使用zookeeper定期提交偏移量,以便在节点发生故障的情况下,它可以从先前提交的偏移量恢复(想象一下你自己处理所有这些)。

Zookeeper also plays a vital role for serving many other purposes, such as leader detection, configuration management, synchronization, detecting when a new node joins or leaves the cluster, etc. Zookeeper还可以用于服务于许多其他目的,例如领导者检测,配置管理,同步,检测新节点何时加入或离开集群等。

Future Kafka releases are planning to remove the zookeeper dependency but as of now it is an integral part of it. 未来的Kafka版本计划取消对zookeeper的依赖,但截至目前它已成为其中不可或缺的一部分。

Here are a few lines taken from their FAQ page: 以下是他们的常见问题解答页面中的几行:

Once the Zookeeper quorum is down, brokers could result in a bad state and could not normally serve client requests, etc. Although when Zookeeper quorum recovers, the Kafka brokers should be able to resume to normal state automatically, there are still a few corner cases the they cannot and a hard kill-and-recovery is required to bring it back to normal. 一旦Zookeeper法定人数下降,经纪人可能会导致状态不佳,无法正常服务客户请求等。虽然当Zookeeper法定人数恢复时,卡夫卡经纪人应该能够自动恢复到正常状态,仍有一些极端情况他们不能,并且需要一个艰难的杀戮和恢复才能恢复正常。 Hence it is recommended to closely monitor your zookeeper cluster and provision it so that it is performant. 因此,建议密切监视您的zookeeper群集并对其进行配置,以使其具有高性能。

For more details check here 有关详细信息,请点击此处

Apache Kafka v2.8.0 gives you early access to KIP-500 that removes the Zookeeper dependency on Kafka which means it no longer requires Apache Zookeeper . Apache Kafka v2.8.0让您可以提前访问KIP-500 ,它消除了 Zookeeper 对 Kafka 的依赖,这意味着它不再需要 Apache Zookeeper


Instead, Kafka can now run in Kafka Raft metadata mode ( KRaft mode ) which enables an internal Raft quorum.相反,Kafka 现在可以在Kafka Raft 元数据模式KRaft mode )下运行,从而启用内部 Raft 仲裁。 When Kafka runs in KRaft mode its metadata is no longer stored on ZooKeeper but on this internal quorum of controller nodes instead.当 Kafka 以KRaft mode运行时,其元数据不再存储在 ZooKeeper 上,而是存储在 controller 个节点的内部仲裁中。 This means that you don't even have to run ZooKeeper at all any longer.这意味着您甚至根本不必再运行 ZooKeeper。

Note however that v2.8.0 is currently early access and you should not use Zookeeper-less Kafka in production for the time being.但请注意,v2.8.0 目前处于早期访问阶段,您暂时不应在生产中使用无 Zookeeper 的 Kafka。


A few benefits of removing ZooKeeper dependency and replacing it with an internal quorum:删除 ZooKeeper 依赖项并将其替换为内部仲裁的一些好处:

  • More efficient as controllers no longer need to communicate with ZooKeeper to fetch cluster state metadata every time the cluster is starting up or when a controller election is being made更高效,因为控制器不再需要与 ZooKeeper 通信以在每次集群启动或进行 controller 选举时获取集群 state 元数据
  • More scalable as the new implementation will be able to support many more topics and partitions in KRaft mode更具可扩展性,因为新的实现将能够在KRaft mode支持更多的主题和分区
  • Easier cluster management and configuration as you don't have to manage two distinct services any longer更轻松的集群管理和配置,因为您不再需要管理两个不同的服务
  • Single process Kafka Cluster单进程Kafka集群

For more details you can read the article Kafka No Longer Requires ZooKeeper有关更多详细信息,您可以阅读文章Kafka 不再需要 ZooKeeper

Important update - August 2019: 重要更新 - 2019年8月:

ZooKeeper dependency will be removed from Apache Kafka . 将从Apache Kafka中删除ZooKeeper依赖项 See the high level discussion in KIP-500 : Replace ZooKeeper with a Self-Managed Metadata Quorum . 请参阅KIP-500中的高级别讨论:将ZooKeeper替换为自我管理的元数据仲裁

These efforts will take a few Kafka releases and additional KIPs. 这些努力将需要一些Kafka版本和其他KIP。 Kafka Controllers will take over the tasks of current ZooKeeper tasks. Kafka Controllers将接管当前ZooKeeper任务的任务。 The Controllers will leverage the benefits of the Event Log which is a core concept of Kafka. 控制器将利用事件日志的优势,这是Kafka的核心概念。

Some benefits of the new Kafka architecture are a simpler architecture, ease of operations and better scalability (eg allow "unlimited partitions". 新Kafka架构的一些好处是更简单的架构,易操作性和更好的可扩展性(例如允许“无限制分区”)。

Yes, Zookeeper is must by design for Kafka.是的,Zookeeper 是为 Kafka 设计的。 Because Zookeeper has the responsibility a kind of managing Kafka cluster.因为 Zookeeper 有一种管理 Kafka 集群的责任。 It has list of all Kafka brokers with it.它有所有 Kafka 代理的列表。 It notifies Kafka, if any broker goes down, or partition goes down or new broker is up or partition is up.它会通知 Kafka,如果任何代理关闭,或分区关闭或新代理启动或分区启动。 In short ZK keeps every Kafka broker updated about current state of the Kafka cluster.简而言之,ZK 使每个 Kafka 经纪人都了解 Kafka 集群的当前 state。

Then every Kafka client(producer/consumer) all need to do is connect with any single broker and that broker has all metadata updated by Zookeeper, so client need not to bother about broker discovery headache.然后每个 Kafka 客户端(生产者/消费者)都需要做的就是与任何单个代理连接,并且该代理拥有 Zookeeper 更新的所有元数据,因此客户端无需担心代理发现问题。

Other than the usual payload message transfer, there are many other communications that happens in kafka. 除了通常的有效载荷消息传输之外,还有许多其他通信在kafka中发生。 like * Events related to brokers requesting the cluster membership * Events related to Brokers becoming available * Getting bootstrap config setups. like *与请求集群成员资格的经纪人相关的事件*与经纪人相关的事件变得可用*获取引导程序配置设置。 * Events related to controller and leader updates. *与控制器和领导者更新相关的事件。 * Help status updates like Heartbeat updates. *帮助状态更新,如Heartbeat更新。

Zookeeper itself is a distributed system consisting of multiple nodes in an ensemble. Zookeeper本身是一个分布式系统,由一个集合中的多个节点组成。 Zookeeper is centralised service for maintaining such metadata. Zookeeper是用于维护此类元数据的集中服务。

Zookeeper is centralizing and management system for any kind of distributed systems. Zookeeper是任何类型的分布式系统的集中和管理系统。 Distributed system is different software modules running on different nodes/clusters (might be on geographically distant locations) but running as one system. 分布式系统是在不同节点/集群上运行的不同软件模块(可能位于地理位置较远的位置),但作为一个系统运行。 Zookeeper facilitates communication between the nodes, sharing configurations among the nodes, it keeps track of which node is leader, which node joins/leaves, etc. Zookeeper is the one who keeps distributed systems sane and maintains consistency. Zookeeper促进节点之间的通信,在节点之间共享配置,跟踪哪个节点是领导者,哪个节点加入/离开等等.Zookeeper是保持分布式系统健全并保持一致性的人。 Zookeeper basically is an orchestration platform. Zookeeper基本上是一个编排平台。

Kafka is a distributed system. Kafka是一个分布式系统。 And hence it needs some kind of orchestration for its nodes that might be geographically distant (or not). 因此,对于可能在地理上遥远(或不是)的节点,它需要某种编排

This article explains the role of Zookeeper in Kafka.这篇文章解释了 Zookeeper 在 Kafka 中的作用。 It explains how kafka is stateless and how zookeper plays an important role in distributed nature of kafka (and many more distributed systems).它解释了 kafka 如何是无状态的,以及 zookeper 如何在 kafka(以及更多分布式系统)的分布式特性中发挥重要作用。

The request to run Kafka without Zookeeper seems to be quite common.在没有 Zookeeper 的情况下运行 Kafka 的请求似乎很常见。 The library Charlatan addresses this.图书馆Charlatan解决了这个问题。

According to the description is Charlatan more or less a mock for Zookeeper, providing the Zookeeper services either backed up by other tools or by a database.根据描述,Charlatan 或多或少是 Zookeeper 的模拟,提供由其他工具或数据库支持的 Zookeeper 服务。

I encountered that library when dealing with the main product of the authors for the Charlatan library;我在处理 Charlatan 库作者的主要产品时遇到了那个库; there it works fine …在那里它工作正常......

Firstly首先

Apache ZooKeeper is a distributed store which is used to provide configuration and synchronization services in a high available way. Apache ZooKeeper 是一个分布式存储,用于以高可用方式提供配置同步服务。 In more recent versions of Kafka, work was done in order for the client consumers to not store information about how far it had consumed messages (called offsets) into ZooKeeper .This reduced usage did not get rid of the need for consensus and coordination in distributed systems however.在 Kafka 的最新版本中,为了让客户端消费者不将有关它消费了多远的消息(称为偏移量)的信息存储到 ZooKeeper 中,已经完成了工作.This reduced usage did not get rid of the need for consensus and coordination in distributed systems however. While Kafka provides fault-tolerance and resilience , something is needed in order to provide the coordination needed and ZooKeeper enables that piece of the overall system.虽然 Kafka 提供了容错和弹性,但还需要一些东西来提供所需的协调,而 ZooKeeper 支持整个系统的这一部分。

Secondly第二

Agreeing on who the leader of a partition is, is one example of the practical application of ZooKeeper within the Kafka ecosystem.就谁是分区领导者达成一致,是 Kafka 生态系统中 ZooKeeper 实际应用的一个例子。

Zookeeper would work if there was even a single broker. 

These are from Kafka In Action book.这些来自Kafka In Action一书。 Image is from this course图片来自本课程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM