简体   繁体   中英

Kafka cluster unavailable if a node in Zookeeper cluster dies

I am configuring a Kafka cluster of 3 brokers. The cluster makes use of a Zookeeper cluster of 3 nodes.

Using Docker, this is how I started my 3 Zookeeper nodes:

docker run --net=my_network --name zoo1 -d -e ZOO_MY_ID=1 -e ZOO_SERVERS="server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888" zookeeper

docker run --net=my_network --name zoo2 -d -e ZOO_MY_ID=2 -e ZOO_SERVERS="server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888" zookeeper

docker run --net=my_network --name zoo3 -d -e ZOO_MY_ID=3 -e ZOO_SERVERS="server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888" zookeeper

And this is how I started my 3 Kafka nodes:

docker run --net=my_network --name kafka1 -d -e KAFKA_ADVERTISED_PORT=9092 -e KAFKA_BROKER_ID=1 -e KAFKA_ZOOKEEPER_CONNECT="zoo1:2181,zoo2:2181,zoo3:2181" wurstmeister/kafka

docker run --net=my_network --name kafka2 -d -e KAFKA_ADVERTISED_PORT=9092 -e KAFKA_BROKER_ID=2 -e KAFKA_ZOOKEEPER_CONNECT="zoo1:2181,zoo2:2181,zoo3:2181" wurstmeister/kafka

docker run --net=my_network --name kafka3 -d -e KAFKA_ADVERTISED_PORT=9092 -e KAFKA_BROKER_ID=3 -e KAFKA_ZOOKEEPER_CONNECT="zoo1:2181,zoo2:2181,zoo3:2181" wurstmeister/kafka

The Zookeeper and Kafka clusters behave well when tested independently.

I mean, I can connect to one of the Zookeeper nodes (say zoo1 ) and create a znode . I can stop the node afterwards (eg, docker stop zoo1 ) and I can still query the znode from any other node in the Zookeeper cluster.

The Kafka cluster also behaves well. Assuming the 3 nodes in Zookeeper are up, I can create a topic, send a message, delete the broker leader, and verify that the message still can be consumed.

My problem is that the Kafka cluster stops working if one of the Zookeeper node dies.

For example, if I stop a zookeeper node (eg, docker stop zoo1 ) and afterwards try to create a topic with this command:

 ./kafka-topics.sh --create --zookeeper "zoo1:2181,zoo2:2181,zoo3:2181" --replication-factor 3 --partitions 1 --topic my-replicated-topic

I will receive an UnknownHostException :

Exception in thread "main" org.I0Itec.zkclient.exception.ZkException: Unable to connect to zoo1:2181,zoo2:2181,zoo3:2181
    at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:71)
    at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1227)
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:156)
    at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:130)
    at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:75)
    at kafka.utils.ZkUtils$.apply(ZkUtils.scala:57)
    at kafka.admin.TopicCommand$.main(TopicCommand.scala:54)
    at kafka.admin.TopicCommand.main(TopicCommand.scala)
Caused by: java.net.UnknownHostException: zoo3: Name or service not known
    at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
    at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
    at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
    at java.net.InetAddress.getAllByName(InetAddress.java:1192)
    at java.net.InetAddress.getAllByName(InetAddress.java:1126)
    at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
    at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
    at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)
    at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:69)

But I do need the Kafka cluster to be fully functional even if one of the machines hosting a Zookeeper node burns. How I can reach that resilience?

As the exception says, the host names may not be resolvable from where you are running create topic command. Try ping to zoo1, zoo2, zoo3 to check if they are resolving to correct IPs.

I don't think it is a Kafka problem. But Zookeeper host name resolution may not be happening correctly. I would suggest first check if the Zookeeper ensemble works correctly when you shut down one of them by creating a new znode and reading a znode created before. Also, try passing Zookeeper IP addresses in the kafka-topics.sh commands in place of the host names.

When you restart the docker instance (say zoo1), it may start with a new IP. will zoo1 hostname be still resolvable from zoo2 and zoo3?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM