簡體   English   中英

如何在docker swarm上設置zookeeper集群

[英]How to setup zookeeper cluster on docker swarm

環境 :6個服務器泊塢群集群(2個主人和4個工人)

要求 :我們需要在現有的docker swarm上設置zookeeper集群。

阻止 :要在群集中設置zookeeper,我們需要在每個服務器配置中提供所有zk服務器,並在myid文件中提供唯一ID。

問題 :當我們在docker swarm中創建zookeeper的副本時,我們如何為每個副本提供唯一的ID。 另外,我們如何使用每個zookeeper容器的ID更新zoo.cfg配置文件。

目前這不是一個簡單的問題。 當每個集群成員需要唯一標識和存儲卷時,完全可擴展的有狀態應用程序集群很棘手。

在Docker Swarm上,今天,最好建議您在compose文件中將每個集群成員作為單獨的服務運行(參見31z4 / zookeeper-docker ):

version: '2'
services:
    zoo1:
        image: 31z4/zookeeper
        restart: always
        ports:
            - 2181:2181
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888

    zoo2:
        image: 31z4/zookeeper
        restart: always
        ports:
            - 2182:2181
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
..
..

對於最先進(但仍在不斷發展)的解決方案,我建議您查看Kubernetes:

Statefulsets的新概念提供了很多希望。 我希望Docker Swarm能夠及時增加類似功能,為每個容器實例分配一個唯一的“粘性”主機名,可以將其用作唯一標識符的基礎。

我們已經創建了一個擴展官方圖像的docker圖像。 已修改entrypoint.sh以便在每個容器啟動時,它會自動發現其余的zookeeper節點並適當地配置當前節點。

您可以在docker商店和我們的github中找到該圖像。

注意 :目前它不處理諸如重新創建容器導致失敗的情況。

編輯(6/11/2018)

最新圖像支持在以下情況下重新配置zookeeper群集:

  • 擴展docker服務(添加更多容器)
  • 縮小docker服務(刪除容器)
  • 由docker swarm重新安排容器導致故障(分配新IP)

我一直在嘗試在docker swarm模式下部署Zookeeper集群。

我已經部署了3台連接到docker swarm網絡的機器。 我的要求是,嘗試在每個節點上運行3個Zookeeper實例,形成整體。 已經完成了這個主題,對如何在docker swarm中部署Zookeeper知之甚少。

正如@junius建議的那樣,我創建了docker compose文件。 我已經刪除了約束,因為docker swarm忽略了它。 請參閱https://forums.docker.com/t/docker-swarm-constraints-being-ignored/31555

我的Zookeeper docker compose文件看起來像這樣

version: '3.3'

services:
    zoo1:
        image: zookeeper:3.4.12
        hostname: zoo1
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /etc/localtime:/etc/localtime:ro
    zoo2:
        image: zookeeper:3.4.12
        hostname: zoo2
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /etc/localtime:/etc/localtime:ro
    zoo3:
        image: zookeeper:3.4.12
        hostname: zoo3
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 3
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /etc/localtime:/etc/localtime:ro
networks:
    net:

使用docker stack命令部署。

docker stack deploy -c zoo3.yml zk創建網絡zk_net創建服務zk_zoo3創建服務zk_zoo1創建服務zk_zoo2

Zookeeper服務很好,每個節點都沒有任何問題。

docker stack services zk ID名稱模式REPLICAS圖像端口rn7t5f3tu0r4 zk_zoo1復制1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp,.0.0.0:2888->2888/tcp,.0.0.0:3888-> 3888 / tcp u51r7bjwwm03 zk_zoo2復制1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp,.0.0.0:2888->2888/tcp,.0.0.0:3888->3888/tcp zlbcocid57xz zk_zoo3 replicated 1 / 1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp,.0.0.0:2888->2888/tcp,.0.0.0:3888->3888/tcp

我已經重現了這里討論的這個問題,當我停止並再次啟動zookeeper堆棧時。

docker stack rm zk docker stack deploy -c zoo3.yml zk

這次沒有形成zookeeper集群。 docker實例記錄了以下內容

ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2018-11-02 15:24:41,531 [myid:2] - WARN  [WorkerSender[myid=2]:QuorumCnxManager@584] - Cannot open channel to 1 at election address zoo1/10.0.0.4:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
        at java.lang.Thread.run(Thread.java:748)
2018-11-02 15:24:41,538 [myid:2] - WARN  [WorkerSender[myid=2]:QuorumCnxManager@584] - Cannot open channel to 3 at election address zoo3/10.0.0.2:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
        at java.lang.Thread.run(Thread.java:748)
2018-11-02 15:38:19,146 [myid:2] - WARN  [QuorumPeer[myid=2]/0.0.0.0:2181:Learner@237] - Unexpected exception, tries=1, connecting to /0.0.0.0:2888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
2018-11-02 15:38:20,147 [myid:2] - WARN  [QuorumPeer[myid=2]/0.0.0.0:2181:Learner@237] - Unexpected exception, tries=2, connecting to /0.0.0.0:2888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)

仔細觀察發現,第一次部署此堆棧時,在節點1上運行id為2的ZooKeeper實例。這創建了一個值為2的myid文件。

cat / home / zk / data / myid 2

當我停止並再次啟動堆棧時,我發現這一次,在節點1上運行了id:3的ZooKeeper實例。

docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 566b68c11c8b zookeeper:3.4.12“/ docker-entrypoin ...”6分鍾前最多6分鍾0.0.0.0:2181->2181/tcp,.0.0.0:2888->2888 / tcp,0.0.0.0:3888-> 3888 / tcp zk_zoo3.1.7m0hq684pkmyrm09zmictc5bm

但是myid文件仍然具有值2,這是由早期實例設置的。

因為日志顯示[myid:2]並且它嘗試連接到id為1和3的實例並且失敗。

在進一步調試時發現docker-entrypoint.sh文件包含以下代碼

# Write myid only if it doesn't exist
if [[ ! -f "$ZOO_DATA_DIR/myid" ]]; then
    echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"
fi

這對我來說是個問題。 我用以下內容編輯了docker-entrypoint.sh,

if [[ -f "$ZOO_DATA_DIR/myid" ]]; then
    rm "$ZOO_DATA_DIR/myid"
fi

echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"

並將docker-entrypoint.sh安裝在我的撰寫文件中。

通過此修復,我能夠多次停止並啟動堆棧,並且每次我的zookeeper群集都能夠形成整體而不會遇到連接問題。

我的docker-entrypoint.sh文件如下

#!/bin/bash

set -e

# Allow the container to be started with `--user`
if [[ "$1" = 'zkServer.sh' && "$(id -u)" = '0' ]]; then
    chown -R "$ZOO_USER" "$ZOO_DATA_DIR" "$ZOO_DATA_LOG_DIR"
    exec su-exec "$ZOO_USER" "$0" "$@"
fi

# Generate the config only if it doesn't exist
if [[ ! -f "$ZOO_CONF_DIR/zoo.cfg" ]]; then
    CONFIG="$ZOO_CONF_DIR/zoo.cfg"

    echo "clientPort=$ZOO_PORT" >> "$CONFIG"
    echo "dataDir=$ZOO_DATA_DIR" >> "$CONFIG"
    echo "dataLogDir=$ZOO_DATA_LOG_DIR" >> "$CONFIG"

    echo "tickTime=$ZOO_TICK_TIME" >> "$CONFIG"
    echo "initLimit=$ZOO_INIT_LIMIT" >> "$CONFIG"
    echo "syncLimit=$ZOO_SYNC_LIMIT" >> "$CONFIG"

    echo "maxClientCnxns=$ZOO_MAX_CLIENT_CNXNS" >> "$CONFIG"

    for server in $ZOO_SERVERS; do
        echo "$server" >> "$CONFIG"
    done
fi

if [[ -f "$ZOO_DATA_DIR/myid" ]]; then
    rm "$ZOO_DATA_DIR/myid"
fi

echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"

exec "$@"

我的docker撰寫文件如下

version: '3.3'

services:
    zoo1:
        image: zookeeper:3.4.12
        hostname: zoo1
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
            - /etc/localtime:/etc/localtime:ro
    zoo2:
        image: zookeeper:3.4.12
        hostname: zoo2
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
            - /etc/localtime:/etc/localtime:ro
    zoo3:
        image: zookeeper:3.4.12
        hostname: zoo3
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 3
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
            - /etc/localtime:/etc/localtime:ro
networks:
    net:

有了這個,我可以使用swarm模式在docker中啟動並運行zookeeper實例,而無需在compose文件中對任何主機名進行硬編碼。 如果我的某個節點出現故障,則會在swarm上的任何可用節點上啟動服務,而不會出現任何問題。

謝謝

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM