简体   繁体   中英

Is zookeeper survives after falling one node in a cluster of three nodes?

  1. I saw, it was similar question at Zoopekeeper instances in Kafka . But the question remained unanswered.

So my extended version of question (with more details)

  1. Environment: There are 3 nodes of business application. Each application contains its own 1 zookeeper and 1 kafka embedded nodes aboard.

Preventing the occurrence of bewildered questions I must clarify. My business application built on top of elasticsearch with 3 nodes with minimumMasterNodes=2, so fault tolerance of my applicaton in cluster is 1. So I assume, that in the same way I can put to each application its own instance of zookeeper node and kafka node. General goal is to build on top of this stack the inter-datacenter data replication for business app using kafka mirrormaker with fault tolerance=1.

In my experiments I didn't use full stack of my business app, but only zookeeper+kafka inside each app node. Each app outputs its log into console, so I could determine, which one has started zookeeper in LEADER mode.

My zookeeper ansemble configuration is:

server.1=localhost:2668:3668
server.2=localhost:2669:3669
server.3=localhost:2670:3670
syncLimit=5
initLimit=10
clientPort=*  #here each node has its own value of port number: 2182,2183,2184 for servers 1,2,3 accordingly
dataDir=D:\rtest\3-nodes\data\*\zoo   # * is 1, 2, 3 accordingly to servers 1,2,3
dataLogDir=D:\rtest\3-nodes\data\*\zoo\log # * is 1, 2, 3 accordingly to servers 1,2,3
  1. My fault scenario is: 2.1. Start all three app nodes. Start consumer (console output). Start application for producing the sequence of messages. Make sure that consumer is receiving messages via kafka cluster. 2.2. Kill app whose instance of zookeeper is leader (in my case it was server #3). 2.3. Ensure that consumer does not output any new message from kafka topic.

From my point of view, the problem lies in the zookeeper. Here are the excerpts of logs that were produced by alive nodes 1, 2: It looks like live zookeeper servers are continue trying to reach dropped server instead of get agreement about quorum among themselves... By the way. In such circumstances I even cannot connect to zookeeper by console clisent (be more clear, i can connect to it, but at first command, shall we say "ls /" console client falls down with exception )

Server 1:

15459 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2182] WARN  org.apache.zookeeper.server.quorum.Learner  - Exception when following the leader
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at java.io.DataInputStream.readInt(Unknown Source)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
        at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
15460 [Thread-3-SendThread(127.0.0.1:2184)] WARN  org.apache.zookeeper.ClientCnxn  - Session 0x354b9dbe0b90001 for server 127.0.0.1/127.0.0.1:2184, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: An existing connection was forcibly closed by the remote host
        at sun.nio.ch.SocketDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(Unknown Source)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
        at sun.nio.ch.IOUtil.read(Unknown Source)
        at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
15459 [Thread-3-SendThread(0:0:0:0:0:0:0:1:2184)] WARN  org.apache.zookeeper.ClientCnxn  - Session 0x354b9dbe0b90000 for server 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2184, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: An existing connection was forcibly closed by the remote host
        at sun.nio.ch.SocketDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(Unknown Source)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
        at sun.nio.ch.IOUtil.read(Unknown Source)
        at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
15459 [RecvWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Connection broken for id 3, my id = 1, error = java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.DataInputStream.readInt(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
15462 [RecvWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Interrupting SendWorker
15462 [SendWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Interrupted while waiting for message on queue java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
15462 [SendWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Send worker leaving thread
15766 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] WARN  org.apache.zookeeper.server.NIOServerCnxn  - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
16481 [WorkerSender[myid=1]] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Cannot open channel to 3 at election address localhost/127.0.0.1:3670
java.net.ConnectException: Connection refused: connect
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.PlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
        at java.lang.Thread.run(Unknown Source)
16596 [Thread-3-SendThread(127.0.0.1:2184)] WARN  org.apache.zookeeper.ClientCnxn  - Session 0x354b9dbe0b90000 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
...

Server2:

...
5118 [RecvWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Connection broken for id 3, my id = 2, error =
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.DataInputStream.readInt(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
5121 [RecvWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Interrupting SendWorker
5120 [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2183] WARN  org.apache.zookeeper.server.quorum.Learner  - Exception when following the leader
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at java.io.DataInputStream.readInt(Unknown Source)
        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
        at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
        at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
        at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
5119 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183] WARN  org.apache.zookeeper.server.NIOServerCnxn  - Exception causing close of session 0x254b9dbe0b20000 due to java.io.IOException: An existing connect
ion was forcibly closed by the remote host
5122 [SendWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Interrupted while waiting for message on queue
java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
5123 [SendWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Send worker leaving thread
5536 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183] WARN  org.apache.zookeeper.server.NIOServerCnxn  - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
6143 [WorkerSender[myid=2]] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Cannot open channel to 3 at election address localhost/127.0.0.1:3670
java.net.ConnectException: Connection refused: connect
        at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
        at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
        at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
        at java.net.PlainSocketImpl.connect(Unknown Source)
        at java.net.SocksSocketImpl.connect(Unknown Source)
        at java.net.Socket.connect(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
        at java.lang.Thread.run(Unknown Source)
....

By the way. The ansamble of 4 such nodes works perfect due to my requirements. So can everybody answer, if zookeeper cluster of 3 nodes can survive after dying one node? Or am I doing something wrong?

3个节点的群集可能丢失1,5个群集的可能丢失2。此处提出了类似的问题: ZooKeeper的可靠性 -3个节点与5个节点

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM