简体   繁体   English

如果 Zookeeper 节点关闭,Kafka 将不会启动

[英]Kafka won't start if a Zookeeper node is down

I have Kafka and Zookeeper co-located on the same servers, with multiple nodes.我有 Kafka 和 Zookeeper 位于同一台服务器上,有多个节点。

In Kafka's server.properties, I have a line like在 Kafka 的 server.properties 中,我有这样一行

zookeeper.connect=server1:2181,server2:2181...

the problem is, Kafka will not start until all of the Zookeeper nodes are available.问题是,在所有Zookeeper 节点都可用之前,Kafka 不会启动。 Otherwise, I will get an error like "fatal error during Kafka startup" and "Timed out waiting for connection while in state: CONNECTING" even though the other Zookeeper nodes are up.否则,即使其他 Zookeeper 节点已启动,我也会收到类似“Kafka 启动期间的致命错误”和“处于状态时等待连接超时:CONNECTING”之类的错误。

This makes it challenging to script startup of each node independently, since the startup scripts on one node are dependent on the state of other nodes.这使得独立启动每个节点的脚本变得具有挑战性,因为一个节点上的启动脚本依赖于其他节点的状态。

First: is this expected behavior or am I doing something wrong?第一:这是预期的行为还是我做错了什么? Suppose I have 3 nodes in Zookeeper cluster;假设我在 Zookeeper 集群中有 3 个节点; all 3 nodes have to be up for Kafka to start?所有 3 个节点都必须启动才能启动 Kafka? That seems counterintuitive, since a larger cluster would actually increase the chance of failure on startup rather than provide more resiliency.这似乎违反直觉,因为更大的集群实际上会增加启动失败的机会,而不是提供更多的弹性。

Second: What's a good solution for this?第二:对此有什么好的解决方案? Is the only approach to make Kafka on each node wait until Zookeeper is fully up on all nodes?是让每个节点上的 Kafka 等待所有节点上的 Zookeeper 完全启动的唯一方法吗?

As far as I know, this is a prerequisite for Kafka to start up correctly, and I don't think too much of a burden.据我所知,这是Kafka正确启动的先决条件,我觉得没有太大的负担。 If the zookeeper cluster itself is already having problems at startup time, Kafka itself might run into problems, so ensuring that the Zookeeper cluster is healthy is a good initial check, IMHO.如果 zookeeper 集群本身在启动时已经出现问题,Kafka 本身可能会遇到问题,因此确保 Zookeeper 集群健康是一个很好的初步检查,恕我直言。

A way to get around this limitation is to configure a single-node Zookeeper cluster, and tell Kafka to use that cluster.解决此限制的一种方法是配置一个单节点 Zookeeper 集群,并告诉 Kafka 使用该集群。 After the fact, you can grow the zookeeper cluster to 3 or more nodes, while Kafka is already up and running.事实上,您可以将 zookeeper 集群扩展到 3 个或更多节点,而 Kafka 已经启动并运行。 More details can be found here: Adding new ZooKeeper node in Kafka cluster?可以在此处找到更多详细信息: 在 Kafka 集群中添加新的 ZooKeeper 节点?

For the record, Kafka itself is completely fine if the Zookeeper cluster goes down once it's up and running.作为记录,如果 Zookeeper 集群在启动并运行后出现故障,Kafka 本身完全没有问题。 It just wouldn't be able to accept new producer/consumer connections or create topics, but the current ones that are active on the cluster continue to work just fine.它只是无法接受新的生产者/消费者连接或创建主题,但当前在集群上处于活动状态的主题继续正常工作。

We have met the same problem in our production environment.我们在生产环境中也遇到过同样的问题。 It turns out to be a bug ( ZOOKEEPER-2184 ) from zookeeper library which kafka uses talking to zookeeper.事实证明这是动物园管理员库中的一个错误( ZOOKEEPER-2184 ),kafka 使用它与动物园管理员交谈。

Our kafka version is 1.1.1 which use zookeeper-3.4.10.jar.我们的kafka版本是1.1.1,使用zookeeper-3.4.10.jar。

After we replaced it with zookeeper-3.4.13.jar, kafka can restart successfully.我们替换成zookeeper-3.4.13.jar后,kafka就可以成功重启了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM