简体繁体中英

does mesos cluster unacceesable when mesos master and agent goes down at same time?

原文 2017-03-21 09:47:33 2 1 linux/ apache-zookeeper/ mesos/ marathon/ dcos

I'm trying to achieve HA with three machines and having masters & slaves like below. I'm using VM's for local test setup and my observations are below.

Case 1:

m1 -> leader master

m2 -> non-leader master, slave1

m3 -> non-leader master, slave2

Case1.1: When I power off VM m1 machine, one of non-leader becomes leading and able to access cluster, working properly.
Case1.2: I power off m2 or m3 (any one of the vm with non-master & slave). I've seen message on webpage of m3 or m2 'No Master is currently leading'. when I try to access mesos in m1 and any one of the available machine(m2 or m3).

Case2:

m1->non-leader

m2->leader,slave1,

m3->non-leader,slave2

Case2.1: When I power off VM m1 machine, leader in m2 will be sustained and cluster works properly.
Case2.2: When I power off m2 (leader with slave), cluster becomes unavailable with error message 'No Master is currently leading' on web page.
Case2.3: When I power off m3 (non-leader with slave),cluster becomes unavailable with error message 'No Master is currently leading' on web page.

Apologies for trying HA with only 3 machines and lengthy problem explanation.

Questions :

Killing machine with both master(leading/non-leading) and slave will always lead to cluster unavailability? (case 1.2,2.2,2.3)
Can we achieve HA with three machines like above ie having 3 masters and 2 slaves with masters and slaves on same machines?
Following are the configuration.

Masters :

m1 : mesos-master --ip=192.168.1.36 --hostname=192.168.1.36 --port=6060 --quorum=2 --cluster=mesosCluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/ncms/mesosWorkDir/ --log_dir=/opt/ncms/mesosWorkDir/logs

m2 : mesos-master --ip=192.168.1.42 --hostname=192.168.1.42 --port=6060 --quorum=2 --cluster=mesosCluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/ncms/mesosWorkDir/ --log_dir=/opt/ncms/mesosWorkDir/logs

m3 : mesos-master --ip=192.168.1.45 --hostname=192.168.1.45 --port=6060 --quorum=2 --cluster=mesosCluster --zk=zk://192.168.1.36:2181,192.168.1.42:2181,192.168.1.45:2181/mesos --work_dir=/opt/ncms/mesosWorkDir/ --log_dir=/opt/ncms/mesosWorkDir/logs

Slaves :

m2 : mesos-slave --ip=192.168.1.42 --hostname=192.168.1.42 --executor_registration_timeout=10mins --systemd_enable_support=false --master=zk://192.168.1.42:2181,192.168.1.45:2181,192.168.1.36:2181/mesos --containerizers=mesos,docker

m3 : mesos-slave --ip=192.168.1.45 --hostname=192.168.1.45 --executor_registration_timeout=10mins --systemd_enable_support=false --master=zk://192.168.1.42:2181,192.168.1.45:2181,192.168.1.36:2181/mesos --containerizers=mesos,docker

Zookeeper Config :

tickTime=2000

initLimit=10

syncLimit=5

dataDir=/opt/ncms/zkWorkDir

clientPort=2181

server.1=192.168.1.42:2888:3888 server.3=192.168.1.36:2888:3888

server.5=192.168.1.45:2888:3888

Setup :

Host: Windows 7 (64GB RAM, 24 Cores )

Virtual Box : each vm(m1, m2, m3) has 2 cores and 2 GB RAM with RHEL 7.2

1 answers

In scenarios you describe, the number of active masters falls below quorum , which is 2 in your case. This is considered an exceptional situation and certain operations will not succeed, for example, any operation modifying the distributed registry .

Transport Endpoint Not Connected - Mesos Slave / Master

mesos-master can not found mesos-slave, and elect a new leader in a short interval

there is an error about libtool when building mesos

How to run mesos on Cent OS 7

Errors compiling Mesos on Alpine Linux

Integrating hadoop yarn with mesos infra

Mesos on OpenStack VM's Public IP

How to manage/rotate/delete mesos logs

Python error while configuring mesos on centos

terminal goes down with all children even when SIGKILLed but normal processes don't do the same

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Transport Endpoint Not Connected - Mesos Slave / Master mesos-master can not found mesos-slave, and elect a new leader in a short interval there is an error about libtool when building mesos How to run mesos on Cent OS 7 Errors compiling Mesos on Alpine Linux Integrating hadoop yarn with mesos infra Mesos on OpenStack VM's Public IP How to manage/rotate/delete mesos logs Python error while configuring mesos on centos terminal goes down with all children even when SIGKILLed but normal processes don't do the same

Related Tags

does mesos cluster unacceesable when mesos master and agent goes down at same time?

Question

1 answers

solution1 0 2017-03-21 17:52:17

solution1
0 2017-03-21 17:52:17