简体   繁体   English

ETCD群集设置失败

[英]Etcd cluster setup failure

I am trying to setup 3 node etcd cluster on Ubuntu machines as docker data store for networking. 我正在尝试在Ubuntu机器上设置3节点etcd集群作为用于网络的docker数据存储。 I successfully created etcd cluster using etcd docker image. 我使用etcd docker镜像成功创建了etcd集群。 Now when I am trying to replicate it, the steps fail on one node. 现在,当我尝试复制它时,步骤在一个节点上失败。 Even after removing the failing node from the step up, the cluster is still looking for the removed node. 即使从逐步删除故障节点后,群集仍在寻找已删除的节点。 The same error is being faced when I am using etcd binary. 当我使用etcd binary时,将面临相同的错误。

Used following command by changing ip accordingly on all nodes: 通过在所有节点上相应地更改ip来使用以下命令:

docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
 --name etcd quay.io/coreos/etcd \
 -name etcd0 \
 -advertise-client-urls http://172.27.59.141:2379,http://172.27.59.141:4001 \
 -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
 -initial-advertise-peer-urls http://172.27.59.141:2380 \
 -listen-peer-urls http://0.0.0.0:2380 \
 -initial-cluster-token etcd-cluster-1 \
 -initial-cluster etcd0=http://172.27.59.141:2380,etcd1=http://172.27.59.244:2380,etcd2=http://172.27.59.232:2380 \
 -initial-cluster-state new

Two of the nodes connect properly but the service of third node stops. 两个节点正确连接,但第三个节点的服务停止。 Following is the log of the third node. 以下是第三个节点的日志。

2016-06-16 17:16:34.293248 I | etcdmain: etcd Version: 2.3.6
2016-06-16 17:16:34.294368 I | etcdmain: Git SHA: 128344c
2016-06-16 17:16:34.294584 I | etcdmain: Go Version: go1.6.2
2016-06-16 17:16:34.294781 I | etcdmain: Go OS/Arch: linux/amd64
2016-06-16 17:16:34.294962 I | etcdmain: setting maximum number of CPUs to 2, total number of available CPUs is 2
2016-06-16 17:16:34.295142 W | etcdmain: no data-dir provided, using default data-dir ./node2.etcd
2016-06-16 17:16:34.295438 I | etcdmain: listening for peers on http://0.0.0.0:2380
2016-06-16 17:16:34.295654 I | etcdmain: listening for client requests on http://0.0.0.0:2379
2016-06-16 17:16:34.295846 I | etcdmain: listening for client requests on http://0.0.0.0:4001
2016-06-16 17:16:34.296193 I | etcdmain: stopping listening for client requests on http://0.0.0.0:4001
2016-06-16 17:16:34.301139 I | etcdmain: stopping listening for client requests on http://0.0.0.0:2379
2016-06-16 17:16:34.301454 I | etcdmain: stopping listening for peers on http://0.0.0.0:2380
2016-06-16 17:16:34.301718 I | etcdmain: --initial-cluster must include node2=http://172.27.59.232:2380 given --initial-advertise-peer-urls=http://172.27.59.232:2380

Even after removing the failing node I can see that the two nodes are waiting for the third node to connect. 即使删除了发生故障的节点,我也可以看到两个节点正在等待第三个节点连接。

2016-06-16 17:16:12.063893 N | etcdserver: added member 17879927ec74147b [http://172.27.59.232:238] to cluster ba4424e006edb53e
2016-06-16 17:16:12.064431 N | etcdserver: added local member 24d9feabb7e2f26f [http://172.27.59.244:2380] to cluster ba4424e006edb53e
2016-06-16 17:16:12.065229 N | etcdserver: added member 2bda70be57138cfe [http://172.27.59.141:2380] to cluster ba4424e006edb53e
2016-06-16 17:16:12.218560 I | raft: 24d9feabb7e2f26f [term: 1] received a MsgVote message with higher term from 2bda70be57138cfe [term: 29]
2016-06-16 17:16:12.218964 I | raft: 24d9feabb7e2f26f became follower at term 29
2016-06-16 17:16:12.219276 I | raft: 24d9feabb7e2f26f [logterm: 1, index: 3, vote: 0] voted for 2bda70be57138cfe [logterm: 1, index: 3] at term 29
2016-06-16 17:16:12.222667 I | raft: raft.node: 24d9feabb7e2f26f elected leader 2bda70be57138cfe at term 29
2016-06-16 17:16:12.335904 I | etcdserver: published {Name:node1 ClientURLs:[http://172.27.59.244:2379 http://172.27.59.244:4001]} to cluster ba4424e006edb53e
2016-06-16 17:16:12.336459 N | etcdserver: set the initial cluster version to 2.2
2016-06-16 17:16:42.059177 W | rafthttp: the connection to peer 17879927ec74147b is unhealthy
2016-06-16 17:17:12.060313 W | rafthttp: the connection to peer 17879927ec74147b is unhealthy
2016-06-16 17:17:42.060986 W | rafthttp: the connection to peer 17879927ec74147b is unhealthy

It can be seen that despite starting the cluster with two nodes it is still searching for the third node. 可以看出,尽管从两个节点开始集群,但它仍在搜索第三个节点。

Is there a location on local disk where data is being saved and its picking up old data despite it being not provided. 本地磁盘上是否有保存数据的位置,并且尽管未提供数据,但仍会拾取旧数据。

Please suggest what I am missing. 请提出我所缺少的。

Is there a location on local disk where data is being saved and its picking up old data despite it being not provided. 本地磁盘上是否有保存数据的位置,并且尽管未提供数据,但仍会拾取旧数据。

Yes, the data of membership already stored at node0.etcd and node1.etcd . 是的,成员资格数据已经存储在node0.etcdnode1.etcd

You can get the following message from the log which indicates that the server already belongs to a cluster: 您可以从日志中获得以下消息,该消息指示服务器已经属于集群:

etcdmain: the server is already initialized as member before, starting as etcd member...

In order to run a new cluster with two members, just add another argument to your command : 为了运行具有两个成员的新集群,只需在命令中添加另一个参数:

--data-dir bak

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM