[英]Change Mesos Master Leader, cause Marathon shutdown?
Env: 信封:
Zookeeper on computer A,
Mesos master on computer B as Leader,
Mesos master on computer C,
Marathon on computer B singleton.
Action: 行动:
Kill Mesos master task on computer B, attempt to change mesos cluster leader
Result: 结果:
Mesos cluster leader change to mesos master on computer C,
But Marathon task on computer auto shutdown with following logs.
Question: 题:
Somebody can help me why marathon down? 有人可以帮助我为什么马拉松下来? and how to fix it!
以及如何解决!
Logs: 日志:
I1109 12:19:10.010197 11287 detector.cpp:152] Detected a new leader: (id='9')
I1109 12:19:10.010646 11291 group.cpp:699] Trying to get '/mesos/json.info_0000000009' in ZooKeeper
I1109 12:19:10.013425 11292 zookeeper.cpp:262] A new leading master (UPID=master@10.4.23.55:5050) is detected
[2017-11-09 12:19:10,015] WARN Disconnected (mesosphere.marathon.MarathonScheduler:Thread-23)
I1109 12:19:10.018977 11292 sched.cpp:2021] Asked to stop the driver
I1109 12:19:10.019161 11292 sched.cpp:336] New master detected at master@10.4.23.55:5050
I1109 12:19:10.019892 11292 sched.cpp:1203] Stopping framework d52cbd8c-1015-4d94-8328-e418876ca5b2-0000
[2017-11-09 12:19:10,020] INFO Driver future completed with result=Success(()). (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,022] INFO Abdicating leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,022] INFO Stopping the election service (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,029] INFO backgroundOperationsLoop exiting (org.apache.curator.framework.imps.CuratorFrameworkImpl:Curator-Framework-0)
[2017-11-09 12:19:10,061] INFO Session: 0x15f710ffb010058 closed (org.apache.zookeeper.ZooKeeper:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,061] INFO EventThread shut down for session: 0x15f710ffb010058 (org.apache.zookeeper.ClientCnxn:pool-3-thread-1-EventThread)
[2017-11-09 12:19:10,063] INFO Stopping MarathonSchedulerService [RUNNING]'s leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,063] INFO Lost leadership (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,066] INFO All actors suspended:
* Actor[akka://marathon/user/offerMatcherStatistics#-1904211014]
* Actor[akka://marathon/user/reviveOffersWhenWanted#-238627718]
* Actor[akka://marathon/user/expungeOverdueLostTasks#608979053]
* Actor[akka://marathon/user/launchQueue#803590575]
* Actor[akka://marathon/user/offersWantedForReconciliation#598482724]
* Actor[akka://marathon/user/offerMatcherLaunchTokens#813230776]
* Actor[akka://marathon/user/offerMatcherManager#1205401692]
* Actor[akka://marathon/user/instanceTracker#1055980147]
* Actor[akka://marathon/user/killOverdueStagedTasks#-40058350]
* Actor[akka://marathon/user/taskKillServiceActor#-602552505]
* Actor[akka://marathon/user/rateLimiter#-911383474]
* Actor[akka://marathon/user/deploymentManager#2013376325] (mesosphere.marathon.core.leadership.impl.LeadershipCoordinatorActor:marathon-akka.actor.default-dispatcher-10)
I1109 12:19:10.069551 11272 sched.cpp:2021] Asked to stop the driver
[2017-11-09 12:19:10,068] INFO Stopping driver (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,069] INFO Stopped MarathonSchedulerService [RUNNING]'s leadership (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,070] INFO Terminating due to leadership abdication or failure (mesosphere.marathon.core.election.impl.CuratorElectionService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,071] INFO Call postDriverRuns callbacks on (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,074] INFO Now standing by. Closing existing handles and rejecting new. (mesosphere.marathon.core.event.impl.stream.HttpEventStreamActor:marathon-akka.actor.default-dispatcher-12)
[2017-11-09 12:19:10,074] INFO Suspending scheduler actor (mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-dispatcher-2)
[2017-11-09 12:19:10,083] INFO Finished postDriverRuns callbacks (mesosphere.marathon.MarathonSchedulerService:ForkJoinPool-3-worker-5)
[2017-11-09 12:19:10,084] INFO ExpungeOverdueLostTasksActor has stopped (mesosphere.marathon.core.task.jobs.impl.ExpungeOverdueLostTasksActor:marathon-akka.actor.default-dispatcher-9)
[1]+ Exit 137
I think there is wrong configuration in zookeeper cluster. 我认为Zookeeper群集中的配置错误。 Use 3 zookeeper cluster and 2 mesos master n multiple slaves.
使用3个zookeeper集群和2个mesos主节点n个从节点。 Ref : https://www.google.co.in/amp/s/beingasysadmin.wordpress.com/2014/08/16/managing-ha-docker-cluster-using-multiple-mesos-masters/amp/
参考: https : //www.google.co.in/amp/s/beingasysadmin.wordpress.com/2014/08/16/managing-ha-docker-cluster-using-multiple-mesos-masters/amp/
Did you set masters reference to marathon conf? 您是否设置了大师参考马拉松大会? can you do
你可以做
cat /etc/marathon/conf/master
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.