繁体   English   中英

Zookeeper 错误:无法在选举地址打开到 X 的通道

[英]Zookeeper error: Cannot open channel to X at election address

我在 3 个不同的 aws 服务器上安装了 zookeeper。 以下是所有服务器中的配置

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/zookeeper
clientPort=2181
server.1=x.x.x.x:2888:3888
server.2=x.x.x.x:2888:3888
server.3=x.x.x.x:2888:3888

所有这三个实例在var/zookeeper中都有一个myid文件,其中包含适当的 id。 这三台服务器的所有端口都从 aws 控制台打开。 但是当我运行 zookeeper 服务器时,我在所有实例中都收到以下错误。

2015-06-19 12:09:22,989 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] 
  - Cannot open channel to 2 at election address /x.x.x.x:3888
java.net.ConnectException: Connection refused
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
  at java.net.Socket.connect(Socket.java:579)
  at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
  at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
  at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
  at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-06-19 12:09:23,170 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382]
   - Cannot open channel to 3 at election address /x.x.x.x:3888
java.net.ConnectException: Connection refused
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
  at java.net.Socket.connect(Socket.java:579)
  at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
  at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
  at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
  at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-06-19 12:09:23,170 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 25600

每个节点中本地服务器的ip是如何定义的? 如果您提供了公共 ip,则侦听器将无法连接到该端口。 您必须为当前节点指定 0.0.0.0

server.1=0.0.0.0:2888:3888
server.2=192.168.10.10:2888:3888
server.3=192.168.2.1:2888:3888

此更改也必须在其他节点上执行。

我遇到了保存问题并解决了它。

确保 myid 与您在 zoo.cfg 中的配置一起保存。

请检查您的conf目录中的 zoo.cfg 文件,其中包含此类内容。

server.1=zookeeper1:2888:3888  
server.2=zookeeper2:2888:3888  
server.3=zookeeper3:2888:3888  

并检查服务器 dataDir 目录中的 myid。 例如:

假设在zoo.cfg上定义的dataDir'/home/admin/data'

那么在zookeeper1上,你必须有一个名为myid的文件,并且这个文件的值为1;在zookeeper2上,你必须有一个名为myid的文件,并且这个文件的值为2; 在 zookeeper3 上,您必须有一个名为myid的文件,并且该文件的值为 3。

如果没有这样配置,服务器将侦听错误的 ip:port。

这对我有用

Step 1:
Node 1:
zoo.cfg
server.1= 0.0.0.0:<port>:<port2>
server.2= <IP>:<port>:<port2>
.
.
.
server.n= <IP>:<port>:<port2>

Node 2 :
server.1= <IP>:<port>:<port2>
server.2= 0.0.0.0:<port>:<port2>
.
.
.
server.n= <IP>:<port>:<port2>


Now in location defined by datadir on your zoo.cfg
Node 1:
echo 1 > <datadir>/id

Node 2:
echo 2 > <datadir>/id

.
.
.


Node n:
echo n > <datadir>/id

这个帮助我成功地启动了动物园管理员,但一旦我开始玩它就会知道更多。 希望这可以帮助。

这是一些 ansible jinja2 模板信息,用于在 zoo.cfg 中使用 0.0.0.0 主机名自动构建集群

{% for url in zookeeper_hosts_list %}
  {%- set url_host = url.split(':')[0] -%}
  {%- if url_host == ansible_fqdn or url_host in     ansible_all_ipv4_addresses -%}
server.{{loop.index0}}=0.0.0.0:2888:3888
{% else %}
server.{{loop.index0}}={{url_host}}:2888:3888
{% endif %}
{% endfor %}

如果您自己的主机名解析为 127.0.0.1(在我的情况下,主机名在 /etc/hosts 中),zookeeper 不会在 zoo.cfg 文件中没有 0.0.0.0 的情况下启动,但如果您的主机名解析为实际机器的IP,你可以把它自己的主机名放在配置文件中。

在 3-Node zookeeper 集合上有类似的问题。 解决方案按照espeirasbora的建议并重新启动。

所以这就是我所做的

动物园管理员1、动物园管理员2和动物园管理员3

A. 问​​题:: 我的集合中的 znodes 无法启动

B. 系统设置:: 三台 3 台机器中的 3 个 Znode

C. 错误::

在我的 Zookeper 日志文件中,我可以看到以下错误

2016-06-26 14:10:17,484 [myid:1] - WARN  [SyncThread:1:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:1 took 1340ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2016-06-26 14:10:17,847 [myid:1] - WARN  [RecvWorker:2:QuorumCnxManager$RecvWorker@810] - Connection broken for id 2, my id = 1, error = 
java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:795)
2016-06-26 14:10:17,848 [myid:1] - WARN  [RecvWorker:2:QuorumCnxManager$RecvWorker@813] - Interrupting SendWorker
2016-06-26 14:10:17,849 [myid:1] - WARN  [SendWorker:2:QuorumCnxManager$SendWorker@727] - Interrupted while waiting for message on queue
java.lang.InterruptedException
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
    at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:879)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:65)
    at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:715)
2016-06-26 14:10:17,851 [myid:1] - WARN  [SendWorker:2:QuorumCnxManager$SendWorker@736] - Send worker leaving thread
2016-06-26 14:10:17,852 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
    at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
    at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
    at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
    at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:846)
2016-06-26 14:10:17,854 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower

D. 行动和解决方案 ::

在每个 znode 上我修改了配置文件 $ZOOKEEPER_HOME/conf/zoo.cfg 将机器 IP 设置为“0.0.0.0”,同时保持其他 2 个 znode 的 IP 地址。 重新启动 znode c. 检查状态 d.Voila ,我很好

见下文

-------------------------------------------------

在 Zookeeper1 上

#Before modification 
[zookeeper1]$ tail -3   $ZOOKEEPER_HOME/conf/zoo.cfg 
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

#After  modification 
[zookeeper1]$ tail -3  $ZOOKEEPER_HOME/conf/zoo.cfg 
server.1=0.0.0.0:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

#Start the Zookeper (Stop and STart or restart )
[zookeeper1]$ $ZOOKEEPER_HOME/bin/zkServer.sh  start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

[zookeeper1]$ $ZOOKEEPER_HOME/bin/zkServer.sh  status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

-------------------------------------------------- -------

在 Zookeeper2 上

#Before modification 
[zookeeper2]$ tail -3   $ZOOKEEPER_HOME/conf/zoo.cfg 
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

#After  modification 
[zookeeper2]$ tail -3  $ZOOKEEPER_HOME/conf/zoo.cfg 
server.1=zookeeper1:2888:3888
server.2=0.0.0.0:2888:3888
server.3=zookeeper3:2888:3888

#Start the Zookeper (Stop and STart or restart )
[zookeeper2]$ $ZOOKEEPER_HOME/bin/zkServer.sh  start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

[zookeeper2]$ $ZOOKEEPER_HOME/bin/zkServer.sh  status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

-------------------------------------------------- -------

在 Zookeeper3 上

#Before modification 
[zookeeper3]$ tail -3   $ZOOKEEPER_HOME/conf/zoo.cfg 
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

#After  modification 
[zookeeper3]$ tail -3  $ZOOKEEPER_HOME/conf/zoo.cfg 
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=0.0.0.0:2888:3888

#Start the Zookeper (Stop and STart or restart )
[zookeeper3]$ $ZOOKEEPER_HOME/bin/zkServer.sh  start
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

[zookeeper3]$ $ZOOKEEPER_HOME/bin/zkServer.sh  status
ZooKeeper JMX enabled by default
ZooKeeper remote JMX Port set to 52128
ZooKeeper remote JMX authenticate set to false
ZooKeeper remote JMX ssl set to false
ZooKeeper remote JMX log4j set to true
Using config: /opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Mode: follower

在我的情况下,问题是,我必须启动所有三个 zookeeper 服务器,只有这样我才能使用./zkCli.sh连接到 zookeeper 服务器

添加有关 Amazon VPC 内 Zookeeper 集群的其他信息。 '0.0.0.0' 的解决方案适用于 Zookeeper直接在 EC2 实例中运行的情况,以防您使用 docker 时,节点重启后,'0.0.0.0' 将无法与 Zookeeper 3.5.X 正常工作。

问题在于解决“0.0.0.0”以及节点地址和 SID 顺序的整体共享(如果您按降序启动节点,则可能不会发生此问题)。

到目前为止,唯一可行的解​​决方案是升级到 3.6.2+ 版本。

我们遇到了同样的问题,对于我们的案例,问题的根本原因是客户端连接数量过多。 aws ec2 实例上的默认 ulimit 是 1024,这会导致 zookeeper 节点无法相互通信。

解决方法是将 ulimit 更改为更高的数字 -> (> ulimit -n 20000 ) 停止并启动 zookeeper。

我有一个类似的问题。 我的三个 zookeeper 节点中的两个节点的状态被列为“独立”,即使 zoo.cfg 文件表明它应该是集群的。 我的第三个节点无法启动,出现您描述的错误。 我认为对我来说修复它的是在我的三个节点上快速连续运行zkServer.sh start ,这样 zookeeper 在达到 zoo.cfg initLimit 之前运行。 希望这对那里的人有用。

我有相同的错误日志,就我而言,我在zookeeper.conf使用节点的主机名。

我的节点在Centos 8 的虚拟机上。

就像@user2286693 说的,我的错误是解析机制:

由于node1 ,当我 ping node1 时:

PING node1(localhost (::1)) 56 data bytes

我检查我的/etc/hosts文件,我发现:

127.0.0.1   localhost localhost.localdomain localhost4 
localhost4.localdomain4 node1

我将这一行替换为:

127.0.0.1   localhost localhost.localdomain localhost4 
localhost4.localdomain4

它正在工作!

希望这对某人有所帮助!

当您遇到此问题时,您会看到如下内容:

org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.6.jar:3.4.6-1569965]

这表明与 Zookeeper 的网络通信问题是原因。

如何修复

将 zk 缩小到 0。然后再缩小到 3。等待它们全部显示就绪。

现在转到 zk-0 oc rsh zk-0并运行以下命令:

/opt/fusion/bin/zookeeper-client
Connecting to zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983

(--- paused for a moment here ---)

Welcome to ZooKeeper!
JLine support is enabled

[zk: zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983(CONNECTING) 0] 

请注意它仍然显示“连接”。 这意味着您没有与 zookeeper 成功通信。

发生这种情况时,您将在/opt/fusion/var/log/zookeeper/zookeeper.log看到:

2021-04-17T00:45:52,848 - WARN  [WorkerSender[myid=1]:QuorumCnxManager@584] - Cannot open channel to 2 at election address zk-2.zk:3888
java.net.UnknownHostException: zk-2.zk
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) ~[?:1.8.0_262]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_262]
        at java.net.Socket.connect(Socket.java:607) ~[?:1.8.0_262]
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435) [zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]

这实际上是我们偶尔在 OpenShift pod 上遇到的臭名昭著的“无路由主机异常”。 发生这种情况时,zookeeper 会显示 Ready 但它无法与其他 zookeeper 通信,因此实际上它在某种意义上还没有准备好。

那怎么解决呢?

将 zk statefulset 缩放到 0,然后再次回到 3。

并重复,直到您成功连接:

/opt/fusion/bin/zookeeper-client
Connecting to zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: zk-0.zk:9983,zk-1.zk:9983,zk-2.zk:9983(CONNECTED) 0]

注意CONNECTED

现在您可以重新启动依赖 zk 的其余服务。

我得到了同样的结果,因为仲裁服务器端口 3181 仍被另一个服务使用 - 更改端口修复它

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM