![](/img/trans.png)
[英]Docker Swarm with Zookeeper - No elected primary cluster manager
[英]How to setup zookeeper cluster on docker swarm
環境 :6個服務器泊塢群集群(2個主人和4個工人)
要求 :我們需要在現有的docker swarm上設置zookeeper集群。
阻止 :要在群集中設置zookeeper,我們需要在每個服務器配置中提供所有zk服務器,並在myid文件中提供唯一ID。
問題 :當我們在docker swarm中創建zookeeper的副本時,我們如何為每個副本提供唯一的ID。 另外,我們如何使用每個zookeeper容器的ID更新zoo.cfg配置文件。
目前這不是一個簡單的問題。 當每個集群成員需要唯一標識和存儲卷時,完全可擴展的有狀態應用程序集群很棘手。
在Docker Swarm上,今天,最好建議您在compose文件中將每個集群成員作為單獨的服務運行(參見31z4 / zookeeper-docker ):
version: '2'
services:
zoo1:
image: 31z4/zookeeper
restart: always
ports:
- 2181:2181
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
zoo2:
image: 31z4/zookeeper
restart: always
ports:
- 2182:2181
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
..
..
對於最先進(但仍在不斷發展)的解決方案,我建議您查看Kubernetes:
Statefulsets的新概念提供了很多希望。 我希望Docker Swarm能夠及時增加類似功能,為每個容器實例分配一個唯一的“粘性”主機名,可以將其用作唯一標識符的基礎。
我一直在嘗試在docker swarm模式下部署Zookeeper集群。
我已經部署了3台連接到docker swarm網絡的機器。 我的要求是,嘗試在每個節點上運行3個Zookeeper實例,形成整體。 已經完成了這個主題,對如何在docker swarm中部署Zookeeper知之甚少。
正如@junius建議的那樣,我創建了docker compose文件。 我已經刪除了約束,因為docker swarm忽略了它。 請參閱https://forums.docker.com/t/docker-swarm-constraints-being-ignored/31555
我的Zookeeper docker compose文件看起來像這樣
version: '3.3'
services:
zoo1:
image: zookeeper:3.4.12
hostname: zoo1
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /etc/localtime:/etc/localtime:ro
zoo2:
image: zookeeper:3.4.12
hostname: zoo2
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /etc/localtime:/etc/localtime:ro
zoo3:
image: zookeeper:3.4.12
hostname: zoo3
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /etc/localtime:/etc/localtime:ro
networks:
net:
使用docker stack命令部署。
docker stack deploy -c zoo3.yml zk創建網絡zk_net創建服務zk_zoo3創建服務zk_zoo1創建服務zk_zoo2
Zookeeper服務很好,每個節點都沒有任何問題。
docker stack services zk ID名稱模式REPLICAS圖像端口rn7t5f3tu0r4 zk_zoo1復制1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp,.0.0.0:2888->2888/tcp,.0.0.0:3888-> 3888 / tcp u51r7bjwwm03 zk_zoo2復制1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp,.0.0.0:2888->2888/tcp,.0.0.0:3888->3888/tcp zlbcocid57xz zk_zoo3 replicated 1 / 1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp,.0.0.0:2888->2888/tcp,.0.0.0:3888->3888/tcp
我已經重現了這里討論的這個問題,當我停止並再次啟動zookeeper堆棧時。
docker stack rm zk docker stack deploy -c zoo3.yml zk
這次沒有形成zookeeper集群。 docker實例記錄了以下內容
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2018-11-02 15:24:41,531 [myid:2] - WARN [WorkerSender[myid=2]:QuorumCnxManager@584] - Cannot open channel to 1 at election address zoo1/10.0.0.4:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
at java.lang.Thread.run(Thread.java:748)
2018-11-02 15:24:41,538 [myid:2] - WARN [WorkerSender[myid=2]:QuorumCnxManager@584] - Cannot open channel to 3 at election address zoo3/10.0.0.2:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
at java.lang.Thread.run(Thread.java:748)
2018-11-02 15:38:19,146 [myid:2] - WARN [QuorumPeer[myid=2]/0.0.0.0:2181:Learner@237] - Unexpected exception, tries=1, connecting to /0.0.0.0:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
2018-11-02 15:38:20,147 [myid:2] - WARN [QuorumPeer[myid=2]/0.0.0.0:2181:Learner@237] - Unexpected exception, tries=2, connecting to /0.0.0.0:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
仔細觀察發現,第一次部署此堆棧時,在節點1上運行id為2的ZooKeeper實例。這創建了一個值為2的myid文件。
cat / home / zk / data / myid 2
當我停止並再次啟動堆棧時,我發現這一次,在節點1上運行了id:3的ZooKeeper實例。
docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 566b68c11c8b zookeeper:3.4.12“/ docker-entrypoin ...”6分鍾前最多6分鍾0.0.0.0:2181->2181/tcp,.0.0.0:2888->2888 / tcp,0.0.0.0:3888-> 3888 / tcp zk_zoo3.1.7m0hq684pkmyrm09zmictc5bm
但是myid文件仍然具有值2,這是由早期實例設置的。
因為日志顯示[myid:2]並且它嘗試連接到id為1和3的實例並且失敗。
在進一步調試時發現docker-entrypoint.sh文件包含以下代碼
# Write myid only if it doesn't exist
if [[ ! -f "$ZOO_DATA_DIR/myid" ]]; then
echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"
fi
這對我來說是個問題。 我用以下內容編輯了docker-entrypoint.sh,
if [[ -f "$ZOO_DATA_DIR/myid" ]]; then
rm "$ZOO_DATA_DIR/myid"
fi
echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"
並將docker-entrypoint.sh安裝在我的撰寫文件中。
通過此修復,我能夠多次停止並啟動堆棧,並且每次我的zookeeper群集都能夠形成整體而不會遇到連接問題。
我的docker-entrypoint.sh文件如下
#!/bin/bash
set -e
# Allow the container to be started with `--user`
if [[ "$1" = 'zkServer.sh' && "$(id -u)" = '0' ]]; then
chown -R "$ZOO_USER" "$ZOO_DATA_DIR" "$ZOO_DATA_LOG_DIR"
exec su-exec "$ZOO_USER" "$0" "$@"
fi
# Generate the config only if it doesn't exist
if [[ ! -f "$ZOO_CONF_DIR/zoo.cfg" ]]; then
CONFIG="$ZOO_CONF_DIR/zoo.cfg"
echo "clientPort=$ZOO_PORT" >> "$CONFIG"
echo "dataDir=$ZOO_DATA_DIR" >> "$CONFIG"
echo "dataLogDir=$ZOO_DATA_LOG_DIR" >> "$CONFIG"
echo "tickTime=$ZOO_TICK_TIME" >> "$CONFIG"
echo "initLimit=$ZOO_INIT_LIMIT" >> "$CONFIG"
echo "syncLimit=$ZOO_SYNC_LIMIT" >> "$CONFIG"
echo "maxClientCnxns=$ZOO_MAX_CLIENT_CNXNS" >> "$CONFIG"
for server in $ZOO_SERVERS; do
echo "$server" >> "$CONFIG"
done
fi
if [[ -f "$ZOO_DATA_DIR/myid" ]]; then
rm "$ZOO_DATA_DIR/myid"
fi
echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"
exec "$@"
我的docker撰寫文件如下
version: '3.3'
services:
zoo1:
image: zookeeper:3.4.12
hostname: zoo1
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
- /etc/localtime:/etc/localtime:ro
zoo2:
image: zookeeper:3.4.12
hostname: zoo2
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
- /etc/localtime:/etc/localtime:ro
zoo3:
image: zookeeper:3.4.12
hostname: zoo3
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
- /etc/localtime:/etc/localtime:ro
networks:
net:
有了這個,我可以使用swarm模式在docker中啟動並運行zookeeper實例,而無需在compose文件中對任何主機名進行硬編碼。 如果我的某個節點出現故障,則會在swarm上的任何可用節點上啟動服務,而不會出現任何問題。
謝謝
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.