简体   繁体   中英

Spark on Hadoop YARN - executor missing

I have a cluster of 3 macOS machines running Hadoop and Spark-1.5.2 (though with Spark-2.0.0 the same problem exists). With 'yarn' as the Spark master URL, I am running into a strange issue where tasks are only allocated to 2 of the 3 machines.

Based on the Hadoop dashboard (port 8088 on the master) it is clear that all 3 nodes are part of the cluster. However, any Spark job I run only uses 2 executors.

For example here is the "Executors" tab on a lengthy run of the JavaWordCount example: 在此处输入图片说明 "batservers" is the master. There should be an additional slave, "batservers2", but it's just not there.

Why might this be?

Note that none of my YARN or Spark (or, for that matter, HDFS) configurations are unusual, except provisions for giving the YARN resource- and node-managers extra memory.

Remarkably, all it took was a detailed look at the spark-submit help message to discover the answer:

YARN-only:

...

--num-executors NUM Number of executors to launch ( Default: 2 ).

If I specify --num-executors 3 in my spark-submit command, the 3rd node is used.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM