简体   繁体   中英

Running Spark on Yarn with HA

I have an HA enabled YARN cluster with two resource managers. The problem is that spark always tries to connect to the first resource manager, even if it's in standby mode.

Yarn version is 2.6 & Spark Version is 1.4.1

yarn-site.xml

<property>
  <name>yarn.resourcemanager.address</name>
  <value>hadoop-0:8050</value>
</property>

<property>
  <name>yarn.resourcemanager.admin.address</name>
  <value>hadoop-0:8141</value>
</property>

<property>
  <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
  <value>/yarn-leader-election</value>
</property>

<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
</property>

<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
</property>

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>hadoop-0</value>
</property>

<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>hadoop-0</value>
</property>

<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>hadoop-4</value>
</property>

log:

16/07/03 05:44:56 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:44:57 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:44:58 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:44:59 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:00 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:01 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:02 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:03 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:04 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:05 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:06 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 10 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:07 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 11 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:08 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 12 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:09 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 13 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:10 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 14 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:10 WARN cluster.YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/07/03 05:45:11 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 15 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:12 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 16 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:13 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 17 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:14 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 18 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:15 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 19 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:16 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 20 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:17 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 21 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:18 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 22 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:19 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 23 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:20 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 24 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:21 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 25 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:22 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 26 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:23 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 27 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:24 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 28 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:25 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 29 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:25 WARN cluster.YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/07/03 05:45:26 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 30 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:27 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 31 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:28 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 32 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:29 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 33 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:30 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 34 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:31 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 35 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:32 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 36 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:33 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 37 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:34 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 38 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:35 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 39 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:36 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 40 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:37 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 41 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:38 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 42 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:39 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 43 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)
16/07/03 05:45:40 INFO ipc.Client: Retrying connect to server: hadoop-0/10.240.0.15:8030. Already tried 44 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1 SECONDS)

由于纱线客户端故障转移机制,客户端将通过循环模式连接RM。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM